Submitted by SejaGentil t3_xyv3ht in MachineLearning
GPT-3 has a prompt limit of about ~2048 "tokens", which corresponds to about 4 characters in text. If my understanding is correct, a deep neural network is not learning after it is trained and is used to produce an output, and, as such, this limitation comes from amount of the input neurons. My question is: what is stopping us from using the same algorithm we use for training, when using the network? That would allow it to adjust its weights and, in a way, provide a form of long-term memory which could let it handle prompts with arbitrarily long limits. Is my line of thinking worng?
suflaj t1_irjffgj wrote
There are several reasons.
One is catastrophic forgetting. You can't just hope your model will always remember what it has initially known. Online training for GPT would imply relearning what it has already learned. It has to constantly repeat at least the gist of what it has learned because new data often changes old insights. Otherwise it will just be finetuning, and you can see in practice that it can hurt general knowledge of the model.
Another reason might be that the new data might not be useful. You have to understand that models as big as GPT-3 do not even go through their whole training set, just a small part of it, and they still generally have strong performance.
And finally, even if the new data was useful, there is no guarantee the model can make use of it at a given checkpoint (or from the start, even). The model might be too small, its architecture and task it is trained on might be inadequate for the data etc. Now, GPTs aren't too small, and they have an architecture very adequate for learning, but we also do not know to what extent we are utilizing their processing capabilities and memory. There isn't exactly a theoretical proof that we can do a lot more with them, let alone a procedure how to do it.
So to conclude, the reason why is simply that it was not proven it's worth it. Small experiments prove almost nothing, and larger experiments would require resources and therefore some promise to acquire them (in a corporate setting).