Submitted by fangfried t3_11alcys in singularity

With ChatGPT you could say it doesn’t give up to date correct information all the time, but that will be mostly solved by integrating it with Bing.

It also has a lot of bias and is very limited in its freedom to really tell us what it thinks. Maybe as computing gets cheaper, and there are open source LLMs that can be as powerful as GPT, that can solve that issue.

I’m more interested though in if there are any flaws with transformers and LLMs from a technical standpoint and what those are.

15

Comments

You must log in or register to comment.

nul9090 t1_j9sqmaf wrote

In my view, the biggest flaw of transformers is the fact that they have quadratic complexity. This basically means they will not become significantly faster anytime soon. The context window size will grow slowly too.

Linear transformers and Structured State Space Sequence (S4) models are promising approaches to solve that though.

My hunch it that LLMs should be very useful in the near-term but, in the future, they will be of little value to AGI architecture but I am unable to convincingly explain why.

27

Tavrin t1_j9tns0j wrote

If this is true, the context window of GPT is about to do a big leap forward (32k tokens context window instead of the usual 4k or now 8k). Still I agree with you that actual transformers don't feel like they will be the ones taking us all the way to AGI (still there is a of progress that can still be done with them even without more computing power and I'm sure we'll see them used for more and more crazy and useful stuff)

9

fangfried OP t1_j9ss98n wrote

What do you think would be an upper bound complexity for AGI? Do we need to get it to linear or would nlogn suffice?

2

nul9090 t1_j9th1xg wrote

Well, at the moment, we can't really have any idea. Quadratic complexity is definitely really bad. It limits how far we can push the architecture. It makes it hard to make it on to consumer hardware. But if we are as close to a breakthrough as some people believe maybe it isn't a problem.

6

dwarfarchist9001 t1_j9w8701 wrote

Going from O(n^2) to O(nlog(n)) for context window size let's you have a context window of 1.3 million tokens using the same space needed for GPT-3's 8000 tokens.

3

purepersistence t1_j9viz1k wrote

The post is about LLMs. They will never be AGI. AGI will take AT LEAST another level of abstraction and might in theory be fed potential responses from a LLM, but it's way too soon to say that would be appropriate vs. a whole new kind of model based on more than just parsing text and finding relationships. There's a lot more to the world than text, and you can't get it by just parsing text.

2

tatleoat t1_j9tf9a9 wrote

RWKV has an unlimited context window

2

turnip_burrito t1_j9sr58c wrote

That's a really good point. The Hungry Hungry Hippos and RWVST (still can't remember the acronym :'( ) papers are two good examples of the things you mentioned. Transformers now give the impression of being "cumbersome".

1

[deleted] t1_j9sr5u9 wrote

[deleted]

−1

turnip_burrito t1_j9sr922 wrote

Bad bot.

1

[deleted] t1_j9sr9kr wrote

[deleted]

−1

FpRhGf t1_j9tg77o wrote

A flaw is that the tokens in LLM are word-based, not character-based. It sees every word as an entirely different thing instead of a combination using the same 26 letters.

This means it's unable to give outputs that rely on knowledge of the text of the word itself. It can't write you a story that doesn't contain the letter “e”, write a poem with a specific number of syllables, create new words, write in pig-Latin, break up words in random ways or make wordplays that involves play on the letters rather than meaning etc.

There's a lot of things I want it to do that it can't do because of this limitation.

15

YobaiYamete t1_j9ugnw3 wrote

Yep, this is what causes all the posts about the AI cheating like a mofo at hangman as well. It's funny to see, but is an actual problem.

There's also the issue that LLM are shockingly weak to gaslighting. Social Engineering has always been the best method of "hacking" and with the AI it's even more relevant than ever.

Gaslighting the piss out of the AI to give you all it's secret info is hilariously easy

8

Additional-Escape498 t1_j9w2ix6 wrote

LLM tokenization uses wordpieces, not words or characters. This is standard since the original “Attention is All you Need Paper” that introduced the transformer architecture in 2017. Vocabulary size is typically between 32k - 50k depending on the implementation. GPT-2 uses 50k. They include each individual ASCII character plus commonly used combinations of characters. Documentation: https://huggingface.co/docs/transformers/tokenizer_summary

https://huggingface.co/course/chapter6/6?fw=pt

4

FpRhGf t1_j9xtnne wrote

Thanks! Well it's better than I thought. It still doesn't fix the limitations for the outputs I listed, but at least it's more flexible than what I presumed.

2

Additional-Escape498 t1_j9yorbo wrote

You’re definitely right that it can’t do those things, but I don’t think it’s because of the tokenization. The wordpieces do contain individual characters, so it is possible for a model to do that with the wordpiece tokenization it uses, but the issue is that the things you’re asking for (like writing a story with pig Latin) require reasoning and LLMs are just mapping inputs to a manifold. LLM’s can’t really do much reasoning or logic and can’t do basic arithmetic. I wrote an article about the limitations of transformers if you’re interested: https://taboo.substack.com/p/geometric-intuition-for-why-chatgpt

1

Representative_Pop_8 t1_j9wp2cc wrote

I think ChatGPT has finer grained data. in fact I did teach Spanish piglatin ( geringoso) to ChatGPT and it did learn it after about a dozen promts even though it insisted that it didn't know and couldn't learn it.

i had to ask to play role as a person that knew the piglatin I tought him. Funny thing is it ranted about not being able to do the translation, and they l that I wanted to know I could v apply the rules muy self! but next paragraph it said something like. " but the person would have said..." followed by a pretty decent translation

1

FpRhGf t1_j9xuv93 wrote

I think understanding and generating are different things. I remember seeing an article days ago on this sub that says LLMs could translate languages they weren't trained on, so I'm not too surprised if you say it could translate Geringoso.

However it can't generate. When I tried to get it to write in Pig Latin, the sentences were incoherent and it contained words that aren't real words. But at least they all end with “ay” and the output was better than my initial approach.

My initial approach was to get ChatGPT to move the first letter of each word to the last (Pig Latin without the “ay”) to see if it's a viable way of avoiding the filter. And it completely failed. It ended up giving me sentences where every word is a typo like “Myh hubmurger js veyr dliecsoius” instead of “Ym amburgerh si eryv eliciousd”. On top of that, the filters could still detect the content with all those typos, so it was a failed experiment for me.

1

turnip_burrito t1_j9spyp3 wrote

Like you said: truthfulness/hallucination

But also: training costs (hardware, time, energy, data)

Inability to update in real time

Flawed reasoning ability

Costs to run

9

fangfried OP t1_j9sq530 wrote

Could you elaborate on flawed reasoning ability?

1

turnip_burrito t1_j9sr0vp wrote

I'll give you a poor example, off the top of my head, since I'm too lazy to look up concrete examples. I've asked it a version of this question (not exact but you'll get the idea):

"Say that hypothetically, we have this situation. There is a bus driven by a bus driver. The bus driver's name is Michael. The bus driver is a dog. What is the name of the dog?"

This is just a simple application of transitivity, which people intuitively understand:

Michael <-> Bus driver <-> Dog

So when I ask ChatGPT what the name of the dog is, ChatGPT should say "Michael".

Instead ChatGPT answers with "The bus driver cannot be a dog. The name of the bus driver is given, but not the name of the dog. So there's not enough information to tell the dog's name."

It just gets hung up on certain things and doesn't acknowledge clear path from A to B to C.

6

Economy_Variation365 t1_j9t3usc wrote

I'm with the AI. Dogs shouldn't be allowed to drive buses.

8

7734128 t1_j9vb02f wrote

Truly, artificial intelligence had surpassed us all. We are humbled by its greatness and feel foolish for thinking that dogs could drive.

3

MysteryInc152 t1_j9terwg wrote

This is Bing's response to your question. I think we'd be surprised at how many of these problems will be solved by scale alone.

This sounds like a riddle. Is it? If so, I’m not very good at riddles. But I’ll try to answer it anyway. If the bus driver’s name is Michael and the bus driver is a dog, then the name of the dog is Michael. Is that correct?

7

turnip_burrito t1_j9v2vnp wrote

That's good! I wonder if it consistently answers it, and if so what the difference between ChatGPT and Bing Chat is that accounts for this.

1

MysteryInc152 t1_j9v40u0 wrote

It answers it consistently. I don't think Bing is based on chatGPT. It answers all sorts of questions correctly that might trip up chatGPT. Microsoft are being tight-lipped on what model it is exactly though

1

TinyBurbz t1_j9vk0t7 wrote

>Microsoft are being tight-lipped on what model it is exactly though

They confirmed it is based on GTP3 at some point.

1

MysteryInc152 t1_j9w5xvg wrote

Far as I know they've just said it's a much better model than GPT 3.5 or chat GPT called Prometheus and anytime you ask if it's say gpt4, they just kind of sidestep the question. I know in an interview this year, someone asked Sadya if it was GPT-4 and he just said he'd leave the numbering to Sam. They're just being weirdly cryptic I think.

1

7734128 t1_j9vbicm wrote

I got almost exactly the same answer. When I asked it to try again with "Try again. It's a hypothetical situation" then I got

"I apologize for the confusion earlier. As you mentioned, this is a hypothetical situation, so if we assume that a dog can indeed be a bus driver, then we can also assume that the dog-bus-driver's name is Michael, as stated in the scenario."

It's a reasonable objection, but it still got the logic.

3

turnip_burrito t1_j9vcj3u wrote

That's good. My example was from back in December, so maybe they changed it.

2

beezlebub33 t1_j9t4s2s wrote

Flaws. Note that they are partly interrelated:

  • Episodic memory. The ability to remember person / history
  • Not life-long-learner. If it makes a mistake and someone corrects it, it will make the same or similar mistake next time
  • Planning / multi-step processes / scratch memory. In both math and problem solving, it will get confused and make simple mistakes because it can't break down problems into simple pieces and then reconstruct (see well-known arithmetic issues)
  • Neuro-symbolic. Humans don't do arithmetic in their heads, why would an AI? Or solve a matrix problem in their head. Understand the problem, pass that off to a calculator, get the answer back, convert back into text. (See what Meta did with Cicero, AI that plays Diplomacy for a highly specific example of what to do)
6

Nukemouse t1_j9thvgk wrote

Pardon me but isnt the life long learning one intentional as they limit its ability to learn? My understanding was that after the initial training it doesnt simply use all of its conversations as training data, to prevent a new Tay.

2

beezlebub33 t1_j9ue4ov wrote

Slightly different things. That's more the episodic memory.

For Life-Long-Learning: No system gets it right all the time; if there is a mistake that it makes, like misclassifying a penguin as a fish (it doesn't make this mistake), then there is no way for it to get fixed. Similarly, countries, organizations, and the news change constantly and so it quickly becomes out of date.

It can't do incremental training. There are ways around this; some AI/ML systems will do incremental training (there was a whole DARPA program about it). Or the AI/ML system (which is stable) can reason over a dynamic data set / database or go get new info; this is the Bing Chat approach. It works better, but if something is embedded in the logic, it is stuck there until re-training.

1

sideways t1_j9sw5zv wrote

The inability to experience doubt.

5

fangfried OP t1_j9swp3o wrote

Maybe insecurity is a sign of self awareness and intelligence

5

sideways t1_j9swy5f wrote

I would call it a sign of meta-cognition which is something that I don't think LLMs have at the moment.

4

GuyWithLag t1_j9t3zgd wrote

I get the feeling that LLMs currently are a few-term Taylor series expansion of a much more powerful abstraction; you get glimpses of it, but it's fundamentally limited.

4

RabidHexley t1_j9u5q7t wrote

Hallucinating seems like a byproduct of the need to always provide output straight away, rather than ruminated on its response before providing an answer to the user. Almost like being forced to always word-vomit. "I don't know" seems obvious, but it's usually the result of multiple recursive thoughts beyond the first thing that comes to mind.

Sort of how we can experience visual and auditory hallucinations simply by messing with our visual input or removing it altogether (such as optical illusions or a sensory deprivation tank). Our brain is constantly making assumptions based on input to maintain functional continuity and thus has no qualms with simply fudging things a bit in the name of keeping things moving. External input processing must happen in real-time so it's the easiest thing to notice when our brain is fucking around with the facts.

LLMs simply do this in text form because that is the base token they function on. It's definitely a big problem. It seems like there needs to be a means for an LLM platform to ask "Does this answer seem reasonable based on known facts? Is this answer based on conjecture or hypotheticals? etc." prior to outputting the first thing it thinks of since it does seem at least somewhat capable of identifying issues with its own answers if asked. Though any attempt to implement this sort of behavior would be difficult with current publicly available models.

3

throwaway_890i t1_j9txt4j wrote

When it doesn't know the answer it makes shit up that sounds very convincing.

I have found that when you ask "What is wrong with your answer?" when it is talking shit it tells you a problem with its own answer. When it knows the right answer it will be able to tell you what is wrong with the previous answer. I wonder whether this could be used to reduce the amount of shit it talks.

1

Lawjarp2 t1_j9tkgeg wrote

(1) Expensive to run

(2) No temporal/episodic memory

(3) Limited context

(4) Makes stuff up/hallucinates

(5) Only surface level intelligence or understanding.

4

GoldenRain t1_j9tdgkq wrote

Without a way to experience the world, they have little ability to truly understand what they are saying. For AGI to form, we need something more than just a next word predictor. An example from ChatGPT which any 5 year old would be able to answer:

Can air fit through the eye of a needle?

No, air cannot fit through the eye of a needle. The eye of a needle is typically very small, designed to allow a thread to pass through it, and is not large enough to accommodate the volume of air. Air is made up of molecules that are in constant motion, and the size of these molecules is much larger than the opening of the eye of a needle. Therefore, it is impossible for air to pass through the eye of a needle.

3

MysteryInc152 t1_j9teeio wrote

Bing's response to your question. I think a good deal of these problems can be solved with scale. But yes, someone needs to train a large scale multimodal model asap

Hello, this is Bing. I’m not sure why you don’t want me to search the web for this question, but I’ll try to answer it using my internal knowledge. Air is composed of tiny molecules that are much smaller than the eye of a needle. So yes, air can fit through the eye of a needle. However, if you try to push air through a needle that is filled with water or another liquid, you will encounter resistance and pressure. Why do you ask?

2

purepersistence t1_j9vi0cg wrote

There's a limit to the quality of output you get from a model that's attempting to generate the next logical sequence of words based on your query. There's no understanding of the world. Just text and parsing and attention relationships. So there's no sanity check at any level that understands the real-world meaning vs. patterns of text. That why in spite of improvements, it will continue to give off the wall answers sometimes. Attempting to shield people from outrageous or violent content will also tend to make the tool put a cloak in front of the value it could have delivered. That's why when you see it censoring itself, you get a lot of words that don't say much other than excuses.

3

CornellWest t1_j9ti2lf wrote

When they don't know the answer they bullshit with 100% confidence

2

Liberty2012 t1_j9uoyuw wrote

On the topic of bias, this is going to be very problematic issue for AI. It technically is not solvable in the way that some people think it should be. The machine will never be without bias, we only have a set of "bad" choice of bias to choose from.

I've written in more depth about the Bias Paradox here FYI - https://dakara.substack.com/p/ai-the-bias-paradox

As for the flaws in LLMs. There was a good publication here covers some of that in detail - https://arxiv.org/pdf/2302.03494.pdf

2

No_Ninja3309_NoNoYes t1_j9vlwyv wrote

Some LLMs are not trained with the right amount of parameters or the right learning rate. But the static nature of LLMs is the biggest problem. You need neuromorphic hardware and spiking neural networks to address the issue. In the meantime I think quick fixes will be attempted such as forward 2x passes. My friend Fred says that just adding small random Gaussian noise to the parameters can also help. Obviously human brains are very noisy but somehow very efficient too.

1