Ali_M

Ali_M t1_j2xkhl3 wrote

Supervised learning isn't the only game in town, and human demonstrations aren't the only kind of data we can collect. For example we can record human preferences over model outputs and then use this data to fine tune models using reinforcement learning (e.g. https://arxiv.org/abs/2203.02155). Even though I'm not a musician, I can still make a meaningful judgement about whether one piece of music is better than another. By analogy, we can use human preferences to train models that are capable of superhuman performance.

8