harharveryfunny t1_ixcpady wrote on November 22, 2022 at 1:32 PM

It seems to me the primary learning mode in the brain - what it fundamentally/automatically does via it's cortical architecture - is sequence prediction (as in predict next word). Correspondingly the primary way we learn language as a child is by listening and copying, and the most efficient language learning methods for adults have also been found to be immersive.

Reinforcement learning can also be framed in terms of prediction (predicting reward/response), and I suspect this is the way that "learning via advice" (vs experience) works, while noting that the former seems more fundamental and powerful - note how we learn more easily from our own experience rather than the advice of others.

I think reinforcement learning is over-hyped, and in animals reward-maximization is more behavior (based on predictive mechanism) than actual mechanism itself.

As far as ML goes, RL as mechanism seems a very tricky beast, notwithstanding the successes of DeepMind, whereas predictive transformer-based LLMs are simple to train and ridiculously powerful, exhibiting all sorts of emergent behavior.

I can't see the motivation for wanting to develop RL-based language models - makes more sense to me to do the opposite and pursue prediction-based reward maximization.