[D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? Submitted by alpha-meta t3_10rpj0f on February 2, 2023 at 1:13 PM in MachineLearning 28 comments 54
alpha-meta OP t1_j6yud7x wrote on February 2, 2023 at 9:21 PM Reply to comment by Jean-Porte in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta Ah yes, I see what you mean now, thanks! Permalink Parent 2
Viewing a single comment thread. View all comments