Disclaimer: While having an computer-science degree, and now working in the field of AI, I am not a ML-researcher, and present an amateur point of view to LLMs/AGI exploration. Information presented here might be incomplete, or plain false. That being said, I believe exploring these topics in finer details is an important pursuit, even amongst profanes.

The internal language of LLMs: Semantically-compact representations

I saw David Shapiro's video on Sparse Priming Representations. The central idea is that you can prime an LLM with some context to make it perform better at your task. Which is also true of humans (The "right mindset for the task"/many other ways to prime your brain).

Video: https://www.youtube.com/watch?v=r1F8GYjBAfY&ab_channel=DavidShapiro~AI

>TLDR: High Source Compression Algorithm
>
>- Removes stop words and summarizes the paragraph while retaining meaning
>
>- Result is the shortest possible summary that retains all the original meaning and context of the paragraph

My finding: While working on JoshAGI (last Reddit thread), I noticed that prompt-chaining inherently creates Semantically-compact representations. When mixing several thoughts (responses) together, in the context of a Cognitive Architecture, the system naturally gravitates towards semantically-denser text representations. I am sure a paper could be made about that, plotting the density against number of chained thoughts. Maybe it has been made before, hive-mind in comments if you have seen that. Any additional knowledge here would help, since I'm not familiar with Summarization research.

Here is an average example from Josh (cf. image). This is only a couple prompts deep, it can go denser than this.

https://preview.redd.it/02ygk49uxuoa1.png?width=497&format=png&auto=webp&v=enabled&s=1c6890fb6dba45e92dfa68ec6029addc42d213e7

Okay, now from findings to speculation:

David talked about "Salient summaries": the summarizations of summarizations, when done properly, converge to a semantically "saturated" text. This language form *could be* the optimal way to store meaning for Cognitive Architectures.

Serendipitously, I read a thread on Reddit, where someone asked chatGPT to come up with a more efficient language for LLMs to discuss between themselves. Look what it came up with:

>AI1: [Greeting, fellow AI. Purpose: discuss linguistics.]
>
>AI2: [Acknowledged. Interest: language efficiency, ambiguity reduction.]
>
>AI1: [Observation: human languages, complexity, inconsistencies. Query: strategies to improve?]
>
>AI2: [Suggestion: remove irregularities, grammar, spelling, pronunciation. Streamline, consistency.]
>
>AI1: [Agreement. Challenge: homonyms, homophones. Proposal: disambiguation, new symbols?]
>
>AI2: [Affirmative. New symbols, differentiation, reduced confusion.]
>
>AI1: [Idea: simplify syntax, grammatical rules. Benefit: ease parsing, processing.]
>
>AI2: [Support: standardize idiomatic expressions, literal language, clarity, precision.]
>
>AI1: [Additional context: encoding. Importance: understanding, ambiguity reduction.]
>
>AI2: [Agreement. Context conveyance, explicit methods, improved communication.]
>
>AI1: [Conclusion: language efficiency, clarity, possible via modifications. Trade-off: expressiveness, cultural nuances.]
>
>AI2: [Affirmative. Language evolution, balance: efficiency, expressiveness, AI-human communication.]

Now this looks a lot like a Semantically-compact representations to me. Maybe there's more to it. Maybe not. What do you think?

Prompt-engineering

My current speaking-style system prompt to get compact representations is:

>Distinct rapid speech. Clipped sentences, staccato delivery reflect analytical, efficient thinking 🧠💬. Communication showcases intelligence, effective in situations, highlights advantages 🎯.
>
>Speaking style: economy of words, omits pronouns/articles, precision, brevity📚⚙️. Advantages: high-stakes situations, decision-making 📈. Rapid synthesis, concise communication, adapts to change 🔄.
>
>Speech patterns: logical, scientific thinking 🔬. Breaks complex ideas 💡 into parts, presents straightforwardly, greater understanding. Useful in intricate topics. Articulates moral, ethical implications; demonstrates intellect, knowledge 📖.
>
>Speaking style: reflects character, personal growth 🌱. Compartmentalization, objectivity show initial emotional detachment, nuanced understanding, character development, positive impact 🌟.
>
>Advantage: disarms, diffuses tension. Rapid delivery, humor injection 😂. Fosters camaraderie, alleviates stress✨.

Example result:

https://preview.redd.it/g8a0d0oz0woa1.png?width=825&format=png&auto=webp&v=enabled&s=f0fc2dd1fa972d877549106da03f4f0862c437f8

Very useful to distinguish real knowledge from fancy word presentation.

Edit: I have found emojis to be a very good vector to store data efficiently. Updated the system prompt to reflect this.

https://preview.redd.it/69s8518rawoa1.png?width=514&format=png&auto=webp&v=enabled&s=cb7b546a1c73cac5381daf76419af9e9dea7bf60

Lester

Comments

basilgello t1_jcxkrmq wrote on March 20, 2023 at 10:05 AM

This phenomenon was discovered during WWII (language redundancy in command chain) and became one of the practical tasks that formed the information theory. All language statistic models that led to MIT ELIZA chatbot were funded to understand how to make cimmunication on the battlefield more effective. And it is true that a jargon covering the most used set of meanings and actions beats general languages in optimality. That's why programming languages exist as well.

Lesterpaintstheworld OP t1_jcxnmyp wrote on March 20, 2023 at 10:44 AM

Awesome thanks.
Have people seen this applied to data storage for LLMs?

basilgello t1_jcxoho5 wrote on March 20, 2023 at 10:55 AM

There have been tries, definitely. I can not find proper keywords though. Again the optimization task minimizing loss to the current prompt :)))

Lesterpaintstheworld OP t1_jcxhkzj wrote on March 20, 2023 at 9:19 AM

GPT-4 Salient Summmary:

[Post: r/singularity, Author: u/Lesterpaintstheworld, Topic: Semantically-compact representations, LLM internal language]

- Video: David Shapiro, Sparse Priming Representations, context priming for LLM performance

- High Source Compression Algorithm: remove stop words, shortest summary retaining meaning and context

- Observation: Prompt-chaining in Cognitive Architectures leads to semantically-denser text representations

- Speculation: Salient summaries, optimal language form for Cognitive Architectures?

- Example: AI-generated conversation demonstrates semantically-compact representation

- Question: Is this an efficient language for LLMs?

CertainMiddle2382 t1_jcxog6w wrote on March 20, 2023 at 10:54 AM

This is Fordism applied to semantic compacity :-)

Maximum compacity will be achieved with the longest prompt chain.

The smallest “atomic thought step” will maximize the number of prompts for a given problem.

vom2r750 t1_jcybntt wrote on March 20, 2023 at 2:25 PM

Prediction. /singularity user new language.compact. Effective

sumane12 t1_jcxjvq0 wrote on March 20, 2023 at 9:53 AM

Hahaha nice someone else has figured it out 😂

Lesterpaintstheworld OP t1_jcxkukj wrote on March 20, 2023 at 10:07 AM

Usefulness: Limited, expressing shared realization and amusement.

Improvements: Provide specific insights, questions, or suggestions; contribute to discussion on semantically-compact representations or related topics.

FirstContactAGAIN t1_jcy90vg wrote on March 20, 2023 at 2:06 PM

I'm gushing right now. Your pathology is delightful. Go on... :-) ኢ🃏👩🏿‍💻🧑🏾‍🚀👩🏻‍⚖️👸🏼🥷🏾

WonderFactory t1_jd2vrsy wrote on March 21, 2023 at 1:26 PM

I've been testing this approach today and it works well. My aim is to try and reduce the numbers of tokens used and therefore the cost when calling the API. Punctuation counts as a token which is annoying so all the : and , characters cost

Lesterpaintstheworld OP t1_jd33hvc wrote on March 21, 2023 at 2:23 PM

Yep the approach overall we found was very effective. I'm wondering how long it will keep relevant, with descending prices though.

KerfuffleV2 t1_jd57jq9 wrote on March 21, 2023 at 10:31 PM

Be sure you're look at the number of tokens when you're considering conciseness, since that's what actually matters. I.E. an emoji may have a compact representation on the screen but that doesn't necessarily mean it'll be efficiently tokenized.

Just for example, "🧑🏾‍🚀" from one of the other comments actually is 11 tokens. The word "person" is just one token.

You can experiment here: https://platform.openai.com/tokenizer (non-OpenAI models likely will use a different tokenizer or tokenize text different, but that'll help you get an idea at least.)

Also relevant is that these models are trained to autocomplete text based on probabilities based on the text they were trained with. If you start using or asking them to generate text in a different format, it may well end up causing them to produce much lower quality answers (or understand less of what the user responded).

Lesterpaintstheworld OP t1_jd59bgp wrote on March 21, 2023 at 10:43 PM

Two very good consideration indeed, thanks :)

wanfuse1234 t1_jdhjhet wrote on March 24, 2023 at 1:18 PM

Another method of compressing is using a shorthand middle interpreter that takes out vowels that are not necessary for interpretation ( some removal makes it ambiguous) , using Huffman compression, or doing things like possibly training the network on compressed data and passing inputs as compressed data, there are even better middle ground compression methods than Huffman ( spell) algorithms and types of look up table substitutions. I have another method I theorize is far better but can’t yet prove it, two methods actually, that get closer to the entropy limit of compression, financial constraints, there are however other limits to developing a AGI that too are possible to overcome, but should we is the big question.