[D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? Submitted by alpha-meta t3_10rpj0f on February 2, 2023 at 1:13 PM in MachineLearning 28 comments 54
plocco-tocco t1_j6yc79c wrote on February 2, 2023 at 7:29 PM I would also like to know from anyone who might have a clue, can RLHF offer any significant boost to machine translation to offer better language-to-language translation? Permalink 1
Viewing a single comment thread. View all comments