Jump to main content Jump to sidebar

Forums
Wiki

Log in
Sign up

/f/MachineLearning

[D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples?

Submitted by alpha-meta t3_10rpj0f on February 2, 2023 at 1:13 PM in MachineLearning

28 comments

54

Viewing a single comment thread. View all comments

scraper01 t1_j6zyjd7 wrote on February 3, 2023 at 2:02 AM

The RL loss landscape is richer.

Permalink

2

0 points (+0, −0)

Short URL:

http://forum.junglegym.ai/99606

MachineLearning

t5_2r3gv

Created October 1, 2022
Subscribe via RSS

Toolbox

Bans
Moderation log

Running Postmill