Submitted by Cool_Abbreviations_9 t3_123b66w in MachineLearning
Borrowedshorts t1_jdu1o78 wrote
So if you're using this for academic research, you can put in your original prompt and then tell it to only return references with a confidence score > .5. Neat little trick.
he_who_floats_amogus t1_jdu8479 wrote
You could do that, but if it's just hallucinating the confidence intervals then it really isn't very neat. The language model have very high reward for hallucinated responses for things like confidence intervals in particular, because hallucinating figures like this will still produce very coherent responses.
SoylentRox t1_jdu9ya6 wrote
So this is an Open domain hallucination:
​
Closed domain hallucinations refer to instances in which the model is instructed to use only information provided
in a given context, but then makes up extra information that was not in that context. For example, if you ask the
model to summarize an article and its summary includes information that was not in the article, then that would be a
closed-domain hallucination.
Open domain hallucinations, in contrast, are when the model confidently provides false
information about the world without reference to any particular input context.
​
​
They handled this via : For tackling open-domain hallucinations, we
collect real-world ChatGPT data that has been flagged by users as being not factual, and collect
additional labeled comparison data that we use to train our reward models.
​
​
Not very productive. The best way to check references would be using a plugin and instructions to the model to "check references". The machine also needs to have RL training so that it will use the plugin and use it correctly the first time.
metigue t1_jdw08fp wrote
Doesn't GPT-4 have some kind of reinforcement learning already baked in though? I asked it what "green as gravy" meant and it responded with a hallucination about it being a widely used expression and examples of its usage. I said "Nice try, but green as gravy is not a widely used expression is it?" It clarified that it is not a widely used expression and it made the stuff up as a possible definition of green as gravy.
Edit: Tried again just now and it still works. Leave system on default and try the user message: What is the meaning of "green as gravy"
SoylentRox t1_jdw2yey wrote
It is not learning from your chats. Apparently OAI does farm for information from CHATGPT queries specifically for RL runs. And I was mentioning that in order for "plugin" support to work even sorta ok the machine absolutely has to learn from it's mistakes.
Remember all it knows is a plugin claims to do something by a description. The machine needs to accurately estimate if a particular user request is going to actually be satisfied by a particular plugin and also how to format the query correctly the first time.
Without this feature it would probably just use a single plugin, ignoring all the others, or get stuck emitting malformed requests a lot and just guess the answer like it does now.
master3243 t1_jdudizk wrote
Who needs statistical tests with theoretical grounding and justified/repeatable results when you've got LLMs™
mizmato t1_jdvgcla wrote
I've seen too many posts on Reddit trying to justify X by saying ChatGPT told them to do it (e.g., asking ChatGPT to do their taxes and then submitting the results). LLMs are something else.
yaosio t1_jduzcus wrote
It can also return hallucinated results from a real source. I've had Bing Chat fabricate paragraphs from real papers. The sidebar can see pages and documents, and even when in the PDF for the paper it will still make things up.
ypxkap t1_jdxwirl wrote
the bing chat thing is interesting because it can’t seem to tell when it can’t see the whole page, eg if you ask it “what’s the last line of this webpage” you’ll get some line x words in (usually ~1100 words for me but it’s been awhile since i checked). if you then send text from after the “last sentence”, it will act like it’s been looking at it the whole time, but as far as i can tell it has no capacity to notice the text otherwise. i asked it to summarize a chat log txt file i had loaded into edge and it included in the summary that there was an advertisement for an iphone 14 and also that “user threatened to harm the AI”, neither of which were present in the text file. that gives me the impression that it’s seeing something completely different from what edge is displaying that also includes instructions over how to respond in some scenarios including being threatened?
muskoxnotverydirty t1_jdvak20 wrote
We've already seen similar prompts such as telling it to say "I don't know" when it doesn't know, and then priming it with examples of it saying "I don't know" to nonsense. Maybe there's something to the added work of getting an output and then iteratively self-critiquing to get to a better final output.
I wonder if they could be using this idea to automatically and iteratively generate and improve their training dataset at scale, which would create a sort of virtuous cycle of improve dataset -> improve LLM -> repeat.
Viewing a single comment thread. View all comments