harharveryfunny t1_itn0gfi wrote on October 24, 2022 at 9:00 PM

Reply to comment by 4e_65_6f in Large Language Models Can Self-Improve by xutw21

They're not scaling up the model, more like making the model more consistent when answering questions:

Generate a bunch of different answers to the same question
Assume most common answer to be the right one
Retrain with this question and "correct" answer as an input
Profit

It's kind of like prompt engineeering - they're not putting more data or capability into the model, but rather finding out how to (empirically) make the best of what it has already been trained on. I guess outlier-answer-rejection would be another way of looking at it.

Instead of "think step by step", this is basically "this step by step, try it a few times, tell me the most common answer", except it can't be done at runtime - requires retraining the model.

4e_65_6f t1_itn1bfj wrote on October 24, 2022 at 9:06 PM

Doesn't that lead to overly generic answers? Like it will pick what most people would likely say rather than the truth? I remember making a model that filled in with the most common next word and it would get stuck going "is it is it is it..." and so on. I guess that method could result in very good answers but that will depend on the data itself.

harharveryfunny t1_itn57pm wrote on October 24, 2022 at 9:31 PM

They increase the sampling "temperature" (amount of randomness) during the varied answer generation phase, so they will at least get some variety, but ultimately it's GIGO - garbage-in => garbage out.

How useful this technique is would seem to depend on the quality of data it was initially trained on and the quality of deductions it was able to glean from that. Best case this might work as a way to clean up it's training data by rejecting bogus conflicting rules it has learnt. Worst case it'll reinforce bogus chains of deduction and ignore the hidden gems of wisdom!

What's really needed to enable any system to self learn is to provide feedback from the only source that really matter - reality. Feedback from yourself, based on what you think you already know, might make you more rational, but not more correct!