Submitted by [deleted] t3_yzy7w1 in MachineLearning
visarga t1_ix2wyw6 wrote
Reply to comment by massimosclaw2 in [D] Are researchers attempting to solve the ‘omnipotence’ requirement problem in LLMs? by [deleted]
There is also prompt-tuning that will fine-tune only a few token embeddings keeping the model itself frozen. This changes the problem from finding that elusive prompt to finding a few labeled examples + fine-tuning the prompt.
Another approach is to use a LLM to generate prompts and filter them by evaluation. This has also been used to generate step by step reasoning traces for datasets that only have input-output pairs. Then train another model on the examples + chain of thought for a big jump in accuracy.
There's a relevant paper here: Large Language Models Can Self-Improve. They find that
> fine-tuning on reasoning is critical for self-improvement
I would add that sometimes you can evaluate a result, for example when generating math or code. Then you can learn from the validated outputs of the network. Basically what was used for AlphaZero to reach super-human level without supervision, but requires a kind of simulator - a game engine, a python interpreter, or a symbolic math engine.
Viewing a single comment thread. View all comments