Viewing a single comment thread. View all comments

metigue t1_jdw08fp wrote

Doesn't GPT-4 have some kind of reinforcement learning already baked in though? I asked it what "green as gravy" meant and it responded with a hallucination about it being a widely used expression and examples of its usage. I said "Nice try, but green as gravy is not a widely used expression is it?" It clarified that it is not a widely used expression and it made the stuff up as a possible definition of green as gravy.

Edit: Tried again just now and it still works. Leave system on default and try the user message: What is the meaning of "green as gravy"

1

SoylentRox t1_jdw2yey wrote

It is not learning from your chats. Apparently OAI does farm for information from CHATGPT queries specifically for RL runs. And I was mentioning that in order for "plugin" support to work even sorta ok the machine absolutely has to learn from it's mistakes.

Remember all it knows is a plugin claims to do something by a description. The machine needs to accurately estimate if a particular user request is going to actually be satisfied by a particular plugin and also how to format the query correctly the first time.

Without this feature it would probably just use a single plugin, ignoring all the others, or get stuck emitting malformed requests a lot and just guess the answer like it does now.

2