Viewing a single comment thread. View all comments

visarga t1_j2bhlqz wrote

Not just human preferences, but also task distribution. They can fine-tune the model specifically on these tasks to make it even better.

2

SoylentRox t1_j2ckmtj wrote

And there's a bunch of obvious automated training it could do to be specifically better at software coding.

It could complete all the challenges on sites like leetcode and code signal, learning from it's mistakes.

It could be given challenges to take an existing program and make it run faster, learning from a timing analysis.

It could take existing programs and be asked to fix the bugs so it passes a unit test.

It could be asked to write a unit test that makes an existing program fail.

And so on. Ultimately millions of separate tasks that the machine can get objective feedback on how well it did on them, and so it can refine it's skills to be above human.

3