Viewing a single comment thread. View all comments

alpha-meta OP t1_j6yud7x wrote on February 2, 2023 at 9:21 PM

Reply to comment by Jean-Porte in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta

Ah yes, I see what you mean now, thanks!