Submitted by blazejd t3_yzzxa2 in MachineLearning
Think about how children learn their native language. At the very beginning, they listen to the language adults use. Later on, they start to produce very basic forms of communication, just using a handful of single words. Only over time, they come up with longer sentences and correct grammar. Most importantly, they continuously interact with people that already speak the language (the environment) and receive real-time feedback and their mistakes are corrected by others. This sounds very similar to reinforcement learning.
On the other hand, current large language models "learn the language" by passively reading huge amounts of text and trying to predict the next word. While the results are impressive, it is not the most intuitive approach and reinforcement learning feels better.
Why do you think the general research trend didn't go in this direction?
AlexGrinch t1_ix49fzx wrote
LMs do not aim to “learn” language, they just approximate the probability distribution of a given corpora of texts. One way to do this is to factorize this probability via chain rule and then assume the conditional independence of words/tokens which are far from each other.
Predicting next word is not some weird heuristic — it’s just the optimization problem which tries to approximate joint probability of some sequences of words. I believe it can be equally formulated as an MDP with reward of 1 for (previous words | next word) transitions present in the dataset.
At the end of the day training some model is the optimization problem. What is the objective of the approach you think is more intuitive than simply fitting the distribution of data you observe?