draconicmoniker t1_izedob5 wrote
Is there a way to log, locate or visualize the contents or knowledge of the neural network at the moment when deception is happening? As the first system to reliably demonstrate AI that can deceive, it would be very informative to build more tools around how to search for and detect it in general.
MetaAI_Official OP t1_izfo503 wrote
While many players do lie in the game, the best players do so very infrequently because it destroys the trust they’ve built with other players. Our agent generates plans for itself as well as for other players that could benefit them, and it tries to have discussions based on those plans. It doesn’t always follow through with what it previously discussed with a player because it may change its mind about what moves to make, but it does not intentionally lie in an effort to mislead opponents. We're excited about the opportunities for studying problems like this that Diplomacy as an environment could provide for researchers interested in exploring this question; in fact, some researchers have already studied human deception in Diplomacy: https://vene.ro/betrayal/niculae15betrayal.pdf and https://www.cs.cornell.edu/~cristian/Deception_in_conversations_files/deception-conversational-dataset.pdf. -AM
MetaAI_Official OP t1_izfq2c8 wrote
[Goff] Two thoughts on this:
Seeing under the hood like this was fascinating and seeing how the model responded to the messages human players sent was great. That is more about detecting when people lie than the other way around though.
On the actual question you asked Alex is spot on that CICERO only ever ""lied"" by accident - you could see when it sent messages it meant them, then it genuinely changed it's plan later.
Viewing a single comment thread. View all comments