4e_65_6f t1_itn1bfj wrote on October 24, 2022 at 9:06 PM

Reply to comment by harharveryfunny in Large Language Models Can Self-Improve by xutw21

Doesn't that lead to overly generic answers? Like it will pick what most people would likely say rather than the truth? I remember making a model that filled in with the most common next word and it would get stuck going "is it is it is it..." and so on. I guess that method could result in very good answers but that will depend on the data itself.

harharveryfunny t1_itn57pm wrote on October 24, 2022 at 9:31 PM

They increase the sampling "temperature" (amount of randomness) during the varied answer generation phase, so they will at least get some variety, but ultimately it's GIGO - garbage-in => garbage out.

How useful this technique is would seem to depend on the quality of data it was initially trained on and the quality of deductions it was able to glean from that. Best case this might work as a way to clean up it's training data by rejecting bogus conflicting rules it has learnt. Worst case it'll reinforce bogus chains of deduction and ignore the hidden gems of wisdom!

What's really needed to enable any system to self learn is to provide feedback from the only source that really matter - reality. Feedback from yourself, based on what you think you already know, might make you more rational, but not more correct!