Submitted by SejaGentil t3_xyv3ht in MachineLearning
suflaj t1_irjffgj wrote
There are several reasons.
One is catastrophic forgetting. You can't just hope your model will always remember what it has initially known. Online training for GPT would imply relearning what it has already learned. It has to constantly repeat at least the gist of what it has learned because new data often changes old insights. Otherwise it will just be finetuning, and you can see in practice that it can hurt general knowledge of the model.
Another reason might be that the new data might not be useful. You have to understand that models as big as GPT-3 do not even go through their whole training set, just a small part of it, and they still generally have strong performance.
And finally, even if the new data was useful, there is no guarantee the model can make use of it at a given checkpoint (or from the start, even). The model might be too small, its architecture and task it is trained on might be inadequate for the data etc. Now, GPTs aren't too small, and they have an architecture very adequate for learning, but we also do not know to what extent we are utilizing their processing capabilities and memory. There isn't exactly a theoretical proof that we can do a lot more with them, let alone a procedure how to do it.
So to conclude, the reason why is simply that it was not proven it's worth it. Small experiments prove almost nothing, and larger experiments would require resources and therefore some promise to acquire them (in a corporate setting).
SejaGentil OP t1_irku09b wrote
Thanks for all the information. It was very helpful and I believe I better understand the whole thing now.
Viewing a single comment thread. View all comments