he_who_floats_amogus t1_jdu8479 wrote on March 27, 2023 at 6:03 AM

Reply to comment by Borrowedshorts in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

You could do that, but if it's just hallucinating the confidence intervals then it really isn't very neat. The language model have very high reward for hallucinated responses for things like confidence intervals in particular, because hallucinating figures like this will still produce very coherent responses.

SoylentRox t1_jdu9ya6 wrote on March 27, 2023 at 6:27 AM

So this is an Open domain hallucination:

Closed domain hallucinations refer to instances in which the model is instructed to use only information provided

in a given context, but then makes up extra information that was not in that context. For example, if you ask the

model to summarize an article and its summary includes information that was not in the article, then that would be a

closed-domain hallucination.

Open domain hallucinations, in contrast, are when the model confidently provides false

information about the world without reference to any particular input context.

They handled this via : For tackling open-domain hallucinations, we
collect real-world ChatGPT data that has been flagged by users as being not factual, and collect
additional labeled comparison data that we use to train our reward models.

Not very productive. The best way to check references would be using a plugin and instructions to the model to "check references". The machine also needs to have RL training so that it will use the plugin and use it correctly the first time.

metigue t1_jdw08fp wrote on March 27, 2023 at 4:36 PM

Doesn't GPT-4 have some kind of reinforcement learning already baked in though? I asked it what "green as gravy" meant and it responded with a hallucination about it being a widely used expression and examples of its usage. I said "Nice try, but green as gravy is not a widely used expression is it?" It clarified that it is not a widely used expression and it made the stuff up as a possible definition of green as gravy.

Edit: Tried again just now and it still works. Leave system on default and try the user message: What is the meaning of "green as gravy"

SoylentRox t1_jdw2yey wrote on March 27, 2023 at 4:54 PM

It is not learning from your chats. Apparently OAI does farm for information from CHATGPT queries specifically for RL runs. And I was mentioning that in order for "plugin" support to work even sorta ok the machine absolutely has to learn from it's mistakes.

Remember all it knows is a plugin claims to do something by a description. The machine needs to accurately estimate if a particular user request is going to actually be satisfied by a particular plugin and also how to format the query correctly the first time.

Without this feature it would probably just use a single plugin, ignoring all the others, or get stuck emitting malformed requests a lot and just guess the answer like it does now.