Nameless1995 t1_ix6hzd0 wrote
Reply to comment by daking999 in [D] Why do we train language models with next word prediction instead of some kind of reinforcement learning-like setup? by blazejd
Yeah, that would be an interesting scalable way to get human feedback. Perhaps, someone is already doing it.
blazejd OP t1_ix7jusm wrote
Interesting idea, but it would probably turn into something similar to Yannic Kilcher's 4chan model which was super toxic because people give the most upvote in highly controversial topics.
daking999 t1_ix7livm wrote
I don't think so, that's not what gets upvoted on reddit (for the most part, on the popular subreddits). It would be moderate/left-leaning. It might even learn to be funny.
Viewing a single comment thread. View all comments