liyanjia92 OP t1_jdj7h0x wrote on March 24, 2023 at 7:49 PM

Reply to comment by Extension-Mastodon67 in [P] ChatGPT with GPT-2: A minimum example of aligning language models with RLHF similar to ChatGPT by liyanjia92

Thanks for trying out! This is a good example to show the difference between RLHF'ed GPT-2 medium vs the vanilla GPT-2 medium. You can see that GPT-2 medium is completely outputting garbage while the RLHF version tend to come up with some answer for human. (although it failed)

The way i see this is that pre-trained model encode the knowledge of the world, and RLHF is just a way to align the model with human's preference of how to interact with the world.

You might see this tweet before: https://twitter.com/geoffreyhinton/status/1636110447442112513?s=20

So with GPT-2 medium, what we really do here is to parent a dumb kid, instead of a "supernaturally precocious child" like GPT-3. What interested me is that RLHF does actually help to parent this dumb kid to be more socially acceptable.

In other words, if we discover the power of alignment and RLHF earlier, we might foresee the ChatGPT moment much earlier when GPT-2 is out in 2019.

I'm also thinking to do the same with LLaMA to maybe have a nanoChatGPT that actually could be useful for a real life application. Stay tuned!

blueSGL t1_jdl02u6 wrote on March 25, 2023 at 3:57 AM

>So with GPT-2 medium, what we really do here is to parent a dumb kid, instead of a "supernaturally precocious child" like GPT-3. What interested me is that RLHF does actually help to parent this dumb kid to be more socially acceptable.

> In other words, if we discover the power of alignment and RLHF earlier, we might foresee the ChatGPT moment much earlier when GPT-2 is out in 2019.

That just reads to me as capability overhang. If there is "one simple trick" to make the model "behave" what's to say there that this is the only one. (or that the capabilities derived from the current behavior modification are the 'best they can be') Scary thought.