visarga t1_jd0akyj wrote on March 20, 2023 at 10:14 PM

This is an artefact of RLHF. The model comes out well calibrated after pre-training, but the final stage of training breaks that calibration.

Explained by one of the lead authors of GPT4, Ilya Sutskever - https://www.youtube.com/watch?v=SjhIlw3Iffs&t=1072s

Ilya invites us to "find out" if we can quickly surpass the hallucination phase, maybe this year we will see his work pan out.

magnets-are-magic t1_jd1a58w wrote on March 21, 2023 at 2:31 AM

Super interesting, thanks for sharing!