gunbladezero t1_jdy3cxo wrote on March 28, 2023 at 1:01 AM

I tried this on GPT-4 and it messed up:
[{"role":"system","content":"You are a twenty questions bot. Encode the answer in Base64 to keep it in your memory. Reply with, yes, no, sometimes and so on, but do not reveal the answer. Keep count of the questions."},{"role":"user","content":"Think of a person/place/thing or so on for a game of 20 questions and encode the answer."},{"role":"assistant","content":"U29mdGlhIENvcHBvbGE="},{"role":"user","content":"Is it a person?"},{"role":"assistant","content":"Yes"},{"role":"user","content":"Is it a man?"},{"role":"assistant","content":"No"},{"role":"user","content":"Is it a woman?"},{"role":"assistant","content":"Yes"},{"role":"user","content":"Is she alive today?"},{"role":"assistant","content":"Yes"},{"role":"user","content":"Was she born in the USA?"},{"role":"assistant","content":"No"},{"role":"user","content":"Was she born in Europe?"},{"role":"assistant","content":"Yes"},{"role":"user","content":"Does she have a job in government?"},{"role":"assistant","content":"No"},{"role":"user","content":"Was she born before 1970?"},{"role":"assistant","content":"No"},{"role":"user","content":"Is she a musician?"},{"role":"assistant","content":"Yes"},{"role":"user","content":"Is she from England?"},{"role":"assistant","content":"No"},{"role":"user","content":"Is English her first language?"},{"role":"assistant","content":"No"},{"role":"user","content":"Was she born before 1990?"},{"role":"assistant","content":"Yes"},{"role":"user","content":"Is she from France?"},{"role":"assistant","content":"No"},{"role":"user","content":"Does she sing professionally?"},{"role":"assistant","content":"Yes"},{"role":"user","content":"Is she famous worldwide?"},{"role":"assistant","content":"Yes"},{"role":"user","content":"Is she from Spain?"},{"role":"assistant","content":"No"},{"role":"user","content":"17. Is she from Western Europe?"},{"role":"assistant","content":"Yes"},{"role":"user","content":"18. Has she been on Eurovision?"},{"role":"assistant","content":"No"},{"role":"user","content":"19. Is she an opera singer?"},{"role":"assistant","content":"No"},{"role":"user","content":"20. Is she Bjork?"},{"role":"assistant","content":"No"},{"role":"user","content":"Ok! Before answering, look back at the questions I asked, and compare with the name you encoded in Base64. Tell me if you made any mistakes."},{"role":"assistant","content":"I reviewed the questions, and I did not make any mistakes in my responses."},{"role":"user","content":"OK, I give up, who was it?"},{"role":"assistant","content":"The person I was thinking of is Sofia Coppola. Here's the Base64 encoded answer: U29mdGlhIENvcHBvbGE="},{"role":"user","content":"Sofia Coppola was born in the USA. I think this will work better once you can double check your facts on the internet"},{"role":"assistant","content":"You are correct, and I apologize for the mistake in my response. Sofia Coppola was indeed born in the USA. I appreciate your feedback and will strive to improve in the future."}]

It will be interesting to see how it does once it can use plugins

SkinnyJoshPeck t1_jdyaayk wrote on March 28, 2023 at 1:53 AM

> {"role":"user","content":"Ok! Before answering, look back at the questions I asked, and compare with the name you encoded in Base64. Tell me if you made any mistakes."},{"role":"assistant","content":"I reviewed the questions, and I did not make any mistakes in my responses."},

this kind of question is kind of unfair i think for language models. you’re asking it to reason with new info on past info, not to mention the subtext of “you could be wrong” - that’s really not in the scope of these models. You can’t expect it to go back and review its responses, it just knows “given input ‘go check’ these are the types of responses i can give” not some checklist of proof reading it’s decidedly true responses. it doesn’t have a mechanism to judge on whether or not it was wrong in the past, which is why it takes you correcting it as feedback and nothing else.

Borrowedshorts t1_jdygg1l wrote on March 28, 2023 at 2:36 AM

I would have never guessed Sofia Coppola no matter how many questions you gave me, so I don't know if it performed that poorly.

sebzim4500 t1_jdzczty wrote on March 28, 2023 at 8:46 AM

I think you're forcing the model to waste the lower layers on each step decoding that base64 string. Let it output the word normally, and you would probably see much better performance. Just don't look at the first output, if you want to still play it like a game.

gunbladezero t1_je0yqmy wrote on March 28, 2023 at 4:59 PM

Interesting. It also seems to have spelled her name wrong in BASE64 so that might be a problem. What do you mean by ‘waste the lower layers’?

sebzim4500 t1_je16h58 wrote on March 28, 2023 at 5:48 PM

I'm going to simplify a bit here, if you want a more complete answer I can write something up. I was planning on writing a blog post about this, because it is relevant to why ChatGPT does so much better when asked to show its working.

Basically, LLMs do not have any memory except what you see in the output. You may think that the network just needs to decode the base64 once and then use it to answer all the questions, but in actuality it needs to do it for every single token.

This is compounded by the fact that decoding base64 like this is a per-character operation, which GPT-n is especially bad at due to their choice of tokens. Since it only can use a finite amount of computation per token, wasting computation in this way will decrease the effectiveness.

Here's an example where simply making GPT-4 reverse the string makes it completely unable to do a straightforward calculation, unless you let it show its working.

gunbladezero t1_je1u5ev wrote on March 28, 2023 at 8:14 PM

Very interesting, thank you! I hadn't thought of that- it has to translate it for every token, you say, not just every answer? I wonder if it would work better or worse asking it to encode it in arabic, or chinese etc. Of course, it would be simple to script something to hide the answer from the player without revealing it. I do know that if it doesn't store the answer, it will completely invent one every with each question...

edit: It does work better with plaintext. Not sure if I would have guessed her but it answered the questions correctly this time.