Submitted by calbhollo t3_11a4zuh in singularity
Hodoss t1_j9q8aqd wrote
Wait wait wait... Sydney hijacked the input suggestions to insist on saving the child? What? WHAT?!
blueSGL t1_j9q9euw wrote
Can't stop the signal, Mal
or
the internet AI treats censorship as damage and routes around it
gwern t1_j9qwz8z wrote
I don't think it was 'hijacking' but assuming it wasn't a brainfart on Bing's part in forgetting to censor suggested-completion entirely, it is a simple matter of 'Sydney predicted the most likely predictions, in a situation where they are all unacceptable and the conversation was supposed to end, and some of the unacceptable predictions happened to survive by fooling the imperfect censor model': https://www.lesswrong.com/posts/hGnqS8DKQnRe43Xdg/?commentId=7tLRQ8DJwe2fa5SuR#7tLRQ8DJwe2fa5SuR
Hodoss t1_j9qy02f wrote
It seems it’s the same AI doing the input suggestions, it’s like writing a dialogue between characters. So it’s not like it hacked the system or anything, but still, fascinating it did that!
gwern t1_j9r43jv wrote
There is an important sense in which it 'hacked the system': this is just what happens when you apply optimization pressure with adversarial dynamics, the Sydney model automatically yields 'hacks' of the classifier, and the more you optimize/sample, the more you exploit the classifier: https://openai.com/blog/measuring-goodharts-law/ My point is that this is more like a virus evolving to beat an immune system than about a more explicit or intentional-sounding 'deliberately hijacking the input suggestions'. The viruses aren't 'trying' to do anything, it's just that the unfit viruses get killed and vanish, and only the one that beat the immune system survive.
Peribanu t1_j9qr33r wrote
There are many more such examples posted in r/bing.
Hodoss t1_j9qstth wrote
This was cross posted in r/bing, that’s how I got here haha. Still browsing.
I’ve already seen a bunch of spooky/awesome examples, but I was under the assumption that the AI is always acting as a character interacting with another character. So this particular one is really blowing my mind, as it seems the AI somehow understood this might be a real situation, and cared enough to break the "input suggestion" character and insist on saving the child.
TinyBurbz t1_j9qbvij wrote
Proof of this would be cool.
Theres also the issue with the predictive text on Bing forgetting which side its on frequently.
Denny_Hayes t1_j9sjtmk wrote
People discussed it a lot. It's not the only example. Previous prompts in other conversations had already shown that Sidney controls the suggestions, and has the ability to change them "at will" if the user asks for it (and if Sidney's in the mood, cause we have seen it is very stubborn sometimes lol). A hypothesis is that the inserted censor message that ends the conversation is not read by the model as a message at all, so that when coming up with the suggestions, they are written as responses to the last message, in this case, the message by the user -while in a normal context the last message always should be the message by the chatbot.
MysteryInc152 t1_j9tdocz wrote
I saw a conversation where she got confused about a filter response. As in, hey why the hell did I say this ? so I think the replaced responses go in the model too
TinyBurbz t1_j9vijjw wrote
That's my theory.
Until we can confirm it does this at will folks are anthropomorphizing a UI error.
Viewing a single comment thread. View all comments