blazejd t1_jcolu6i wrote on March 18, 2023 at 11:05 AM

Reply to [D] Unit and Integration Testing for ML Pipelines by Fender6969

I've been trying to figure it out myself and I'm very curious to see other responses.

blazejd OP t1_ix8k4wf wrote on November 21, 2022 at 4:03 PM

Reply to comment by blazejd in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

>Sure, but the answer remains: what reward function do you use that encompasses understanding and communicating, on top of grammar?

I realize this doesn't directly answer your question, so might point is that we don't know the answer, but we should at least try to pursue it.

blazejd OP t1_ix8il0p wrote on November 21, 2022 at 3:52 PM

Reply to comment by waa007 in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

Can you rephrase the last part of your second sentence? Don't quite get what you mean.

blazejd OP t1_ix8ibmh wrote on November 21, 2022 at 3:50 PM

Reply to comment by idrajitsc in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

>For actually learning language, in the sense of using it to convey meaningful, appropriate information, which LLMs so far cannot do, maybe it's better to take an RL approach. But I don't know how to write a reward function that encompasses that. So as long as we can't do the superior thing with either approach, we might as well focus on the easier approach to the superficial thing.

My understanding of this paragraph simply put is (correct me if I'm wrong) "RL might be better, but we don't know how to do it, so let's not try. Language models are doing fine.".

In my opinion, in science we should focus simultaneously on easier problems that can lead to shorter-term gains (language models) AND ALSO more difficult problems that are riskier but might be better long term (RL-based).

blazejd OP t1_ix7mr03 wrote on November 21, 2022 at 10:57 AM

Reply to [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

Thank you everyone for your comments, they were really insightful and gave me some perspective I wouldn't have on my own. I am quite new to ML reddit so wasn't sure what to expect. Here is my quick summary/general reply.

Most of you agreed that we use language modelling because it is the most compute- and time-effective way and that's sort of the best thing we have *right now*, but RL would be interesting to incorporate. However, initializing solely with RL is difficult, including choosing a good objective.

This seems a bit similar to the hype about SVMs in early 2000s (from what I heard from senior researchers, it was a thing). Basically, back then we already had neural networks, but we weren't ready hardware/data-wise so at the time SVMs were performing better due to their simplicity but after 20 years we can clearly see neural nets was the right direction. It's easier to use language model now, they give better short-term performance, but in a couple decades probably RL will outperform them (although very likely multi-modality will be necessary).

A currently feasible step in this direction is merging the two concepts of language models and RL-based feedback. Some papers mentioned are: https://arxiv.org/abs/2203.02155 and "Experience Grounds Language" (although I didn't read them entirely yet). We could initialize a customer-facing chatbot with a language model and then update it RL-style which can be thought of as some form of online or continual learning. The RL objective could be the rating user gives after interacting with the system, the frequency of the use asking to talk to a human assistant or the sentiment of user replies (positive or negative). And if we could come up with that bouncing off ideas on reddit, then probably some company is already doing that.

If you are looking for more related resources, my thoughts were inspired by the field of language emergence (https://arxiv.org/pdf/2006.02419.pdf) and this work (https://arxiv.org/pdf/2112.11911.pdf).

blazejd OP t1_ix7kazf wrote on November 21, 2022 at 10:21 AM

Reply to comment by Kylaran in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

Glad to hear a non-ML perspective on it! Initializing with language models and then using RL for feedback makes a lot of sense. Could you share any particular papers that I could look into?

blazejd OP t1_ix7k18e wrote on November 21, 2022 at 10:17 AM

Reply to comment by asafaya in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

What language models are doing is indeed modelling language distribution, but what ML community wants them to be doing and what is the end goal is to create a model that learns to understand and communicate with a language. You can see that by the ways we try to evaluate the language, for example asking it to solve math equations

blazejd OP t1_ix7jusm wrote on November 21, 2022 at 10:14 AM

Reply to comment by Nameless1995 in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

Interesting idea, but it would probably turn into something similar to Yannic Kilcher's 4chan model which was super toxic because people give the most upvote in highly controversial topics.

blazejd OP t1_ix7jpz2 wrote on November 21, 2022 at 10:12 AM

Reply to comment by Nameless1995 in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

This is interesting, but I was thinking a bit more high-level. In essence, BERT and GPT are both self-supervised language models trained on passive data with a similar objective.

blazejd OP t1_ix7jmyo wrote on November 21, 2022 at 10:11 AM

Reply to comment by Nameless1995 in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

Love this response, a lot of it coincides with the thoughts before, I will refer to it in my general response.

blazejd OP t1_ix7jekd wrote on November 21, 2022 at 10:07 AM

Reply to comment by idrajitsc in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

I think u/Cheap_Meeting understood my question a bit better here. The end goal is to create an NLP model that learns to can understand and communicate in natural language. This is why currently the main NLP benchmarks cover many different tasks etc. We use language models because that's an easier approach, but not necessarily better.

blazejd OP t1_ix7iz8w wrote on November 21, 2022 at 10:01 AM

Reply to comment by HateRedditCantQuitit in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd

Indeed the data availability, time and cost effectiveness seem are some sensible reasons I was considering. But we already have chatbots and voice assistants talking to people at scale, don't we?