Viewing a single comment thread. View all comments

monouns t1_jbmy24f wrote

But I'm not quite sure how GPT and PPO get trained from feedback relative to each other.

1