Submitted by SejaGentil t3_xyv3ht in MachineLearning
harharveryfunny t1_irslrit wrote
Reply to comment by SejaGentil in [D] Why can't language models, like GPT-3, continuously learn once trained? by SejaGentil
GPT-3 isn't a layered architecture - the proper name for it is a "transformer". It's highly structured. Nowadays there are many different architectures in use.
The early focus on layers was because there are simple things, such as learning an XOR function, that a single layer network can't do, but originally (50 years ago) no-one knew how a multi-layer network could be trained. The big breakthough therefore was when c. 1980 the back-propagation algorithm was invented which solved the multi-layer "credit assignment" training problem (and also works on any network shape - graphs, etc).
The "modern neural net era" really dates to the ImageNet image recognition competition in 2012 (only 10 years ago!) when a multi-layer neural net beat older non-ANN image recognition approaches by a significant margin. For a number of years after that the main focus of researchers was performing better on this ImageNet challenge with ever more elaborate and deeper multi-layer networks.
Today, ImageNet is really a solved problem, and the focus of neural nets has shifted to other applications such as language translation, speech recognition, generative language models (GPT-3) and recent text-to-image and text-to-video networks. These newer applications are more demanding and have required more sophisticated architectures to be developed.
Note that our brain actually has a lot of structure to it - it's not just one giant graph where any neuron could connect to any other neuron. For example, our visual cortex actually has what is roughly a layered archtecture (V1-V5), which is why multi-layer nets have done so well in vision applications.
SejaGentil OP t1_irstl4q wrote
Thanks for this overview, it makes a lot of sense. Do you have any ideas as to why GPT-3, DALL-E and the like are so bad at generating new insights and logical reasoning? My feeling is that these networks are very good at recalling, like a very dumb human that compensated it with a wikipedia-size memory. For example, if I attempt to prompt something like this on GPT-3:
This is a logical question. Answer it using exact, mathematical reasoning.
There are 3 boxes, A, B, C.
I take the following actions, in order:
- I put a 3 balls on box A.
- I move 1 ball from box A to box C.
- I swap the contents of box A and box B.
How many balls are on each box?
It will fail miserably. Trying to teach it any kind of programming logic is a complete failure, it isn't able to get very basic questions right. Asking step by step doesn't help. For me, the main goal of AGI is to be able to teach a computer how to prove theorems in a proof assistant like Agda, and let it be as apt as myself. But GPT-3 is as unapt as every other AI, and it seems like scaling won't do anything about that. That's why, to me, it feels like AI as a whole is making 0 progress towards (my concept of) AGI, even though it is doing amazing feats in other realms, and that's quite depressing. I use GPT-3 Codex a lot when coding, but only when I need to do some kind of repetitive trivial work, like converting formats. Anything that needs any sort of reasoning is out of its reach. Similarly, DALLE is completely unable to generate new image concepts (like a cowboy riding an ostrich, a cow with a duck beak...).
harharveryfunny t1_irtbnwz wrote
GPT-3 isn't an attempt at AI. It's literally just a (very large) language model. The only thing that it is designed to do is "predict next word", and it's doing that in a very dumb way via the mechanism of a transformer - just using attention (tuned via the massive training set) to weight the recently seen words to make that prediction. GPT-3 was really just an exercise in scaling up to see how much better (if at all) a "predict next word" language model could get if the capacity of the model and size of the training set were scaled up.
We would expect GPT-3 to do a good job of predicting next word in a plausible way (e.g. "the cat sat on the" => mat), since that it literally all it was trained to do, but the amazing, and rather unexpected, thing is that it can do so much more ... Feed it "There once was a unicorn", and it'll start writing whole fairy tale about unicorns. Feed Codex "Reverse the list order" and it'll generate code to perform that task, etc. These are all emergent capabilities - not things that it was designed to do, but things that needed to learn to do (and evidentially was capable of learning, via its transformer architecture) in order to get REALLY good at it's "predict next word" goal.
Perhaps the most mind blowing Codex capability was the original release demo video from OpenAI where it had been fed the Microsoft Word API documentation, then was able to USE that information to write code to perform a requested task ("capitalize first letter of each word" if I remember correctly)... So think about it - it was only designed/trained to "predict next word", yet is capable of "reading API documentation" to write code to perform a requested task !!!
Now, this is just a language model, not claiming to be an AI or anything else, but it does show you the power of modern neural networks, and perhaps give some insight into the relationship between intelligence and prediction.
DALL-E isn't claiming to be an AI either, and has a simple flow-through architecture. It basically just learns a text embedding which it maps to an image embedding which is then decoded to the image. To me it's more surprising that something so simple works as well as it does, rather than disappointing that it only works for fairly simple types of compositional requests. It certainly will do its best to render things it was never trained on, but you can't expect it to do very well with things like "two cats wrestling" since it has no knowledge of cat's anatomy, 3-D structure, or how their joints move. What you get is about what you'd expect given what the model consists of. Again, its a pretty simple flow thru text-to-image model, not an AI.
For any model to begin to meet your expectations of something "intelligent" it's going to have to be designed with that goal in the first place, and that it still in the future. So, GPT-3 is perhaps a taste of what is to come... if a dumb language model is capable of writing code(!!!), then imagine what a model that is actually designed to be intelligent should be capable of ...
SejaGentil OP t1_iruye6s wrote
Just answering to thank you for all the info, I don't have any more question for now.
Viewing a single comment thread. View all comments