bigabig t1_j6z3a6d wrote on February 2, 2023 at 10:18 PM Reply to [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta I thought this was also because you do not need so much supervised training data because you 'just' have to train the reward model in a supervised fashion? Permalink 2
bigabig t1_j6yc5gr wrote on February 2, 2023 at 7:28 PM Reply to [N] Microsoft integrates GPT 3.5 into Teams by bikeskata Is the automatic transcription done with openai whisper? Permalink 3
bigabig t1_j6z3a6d wrote
Reply to [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
I thought this was also because you do not need so much supervised training data because you 'just' have to train the reward model in a supervised fashion?