Viewing a single comment thread. View all comments

alpha-meta OP t1_j72dpto wrote

Good point, so you mean they incorporate things like beam search + changing temperature, top-k sampling, and nucleus sampling in the RL PPO-based optimizaton?

1

_Arsenie_Boca_ t1_j72g4g4 wrote

Im not sure if they vary the sampling hyperparemeters. The point is that langauge modelling objectives are to some degree ill-posed because we calculate the loss on intermediate results rather than the final output that we care about.

1