Hyper1on t1_j9y3vz1 wrote on February 25, 2023 at 12:11 PM

Reply to comment by Imnimo in [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt

I mean, I don't see how you get a plausible explanation of BingGPT from underfitting either. As you say, models are underfit on some types of data, but I think the key here is the finetuning procedure, either normal supervised, or RLHF, which is optimising for a particular type of dialogue data in which the model is asked to act as an "Assistant" to a human user.

Part of the reason I suspect my explanation is right is that ChatGPT and BingGPT were almost certainly finetuned on large amounts of dialogue data, collected from interactions with users, and yet most of the failure modes of BingGPT that made the media are not stuff like "we asked it to solve this complex reasoning problem and it failed horribly", they are instead coming from prompts which are very much in distribution for dialogue data, such as asking the model what it thinks about X, or asking the model to pretend it is Y and you would expect the model to have seen dialogues which start similarly before. I find underfitting on this data to be quite unlikely as an explanation.