I asked this a couple weeks ago in /r/learnmachinelearning but maybe it will get some more traction here.

I'm familiar with things like self-play in training a player for games like Chess, Checkers, and many other games where move sets and goals are the same for both players - e.g. eliminate the other player's pieces, prevent the other player from moving (like in Amazons, for example), etc. But what about training a computer player for a game where one side has a different goal than the other? Things like Maker-Breaker Games. I imagine I would want to train a strong Maker and then train a strong Breaker. So it seems less like an application of Reinforcement Learning and more like a Generative Adversarial Network. But it's not really generative, so not a simple GAN application. I imagine it's still an RL task, but I'm interested in how to go about it.

Does anyone have any reading on these types of problems? Thanks.

Comments

You must log in or register to comment.

dualmindblade t1_iva5ncr wrote on November 6, 2022 at 1:54 PM

Disclaimer: not even close to an expert, I just keep up with the state of the field

If you were using something the AlphaZero algorithm, I'm fairly certain the asymmetry is not an issue, it would work unmodified, and also I don't think you'd want to use two models, it would weaken the play. Argument is that the NN part is trying to intuit the properties of a big tree search in which both players are participants, so it must understand both strategies about equally regardless of which side it's playing. It's no different from a human player, when you make a move the next step is to consider the resulting position from the other side and evaluate their potential moves. BTW chess is not super symmetric in practice, usually black will need to adopt a defensive strategy in the opening.

computing_professor OP t1_iva7s6f wrote on November 6, 2022 at 2:10 PM

Cool, thanks for the reply. With chess, I always assumed it was just examining the state as a pair (board,turn), regardless of who went first. I study the mathematics of combinatorial games and it's rare to ever consider who moves first, as it's almost always more interesting to determine the best move for any given game state.

Do you have any reading suggestions for understanding AlphaZero? I've read surface level/popular articles, but I'm a mathematician and would like to dig deeper into it. And, of course, learn how to apply it in my case.

dualmindblade t1_iva9p3g wrote on November 6, 2022 at 2:24 PM

I would suggest reading the original alphago paper, it's extremely digestible, then skim the AlphaZero one, less detail there because it's a very similar architecture and actually it is simpler than the original. Think of AlphaZero as a scheme for improving the loss function, the actual architecture of the NN part is sort of unimportant, you can think of it as a black box, or maybe a black box with two smaller boxes sticking out of it.

computing_professor OP t1_ivaak87 wrote on November 6, 2022 at 2:30 PM

Thanks!

sckuzzle t1_ivbiskk wrote on November 6, 2022 at 7:23 PM

> so it must understand both strategies about equally regardless of which side it's playing

What do you mean here by "understand"? My understanding is that the state-of-the-art AI has no concept of what the capabilities of its opponent are or even what its opponent might be thinking; it only understands how to react in order to maximize a score.

So while you could train it to react well no matter which side it is playing, how would it benefit from being able to play the other side better? It would need to spin up a duplicate of itself to play the other side and then analyze itself to understand what is happening, but then it would just get into an infinite loop as it's duplicate self spins up its own duplicate.

I guess what I'm getting at is that these AI algorithms have no theory of mind. They are simple stimulus-react models. Even the concept of an opposing player is beyond them - it'd be the same whether it was playing solitaire or chess.

dualmindblade t1_ivbs4ra wrote on November 6, 2022 at 8:23 PM

By the other side, I meant the other side of the board, but let's explore your ideas a bit in the context of board game algorithms. In the case of the AlphaZero algorithm, the other opponent is itself. The neural network part of alpha zero acts as a sort of intuition engine, and it's trying to intuit 2 related but actually different things, 1 the value of a particular move, how good or bad it is, 2 which move AlphaZero itself will be likely to choose after it has thought about it for a long time. By thinking, I mean running many many simulated games from the current position, making random moves probabilistically weighted by intuition 1. This is the novel idea of the algorithm, and it allows it to drastically magnify the amount of data used to train the neural network. Instead of having to play an entire game to get 1 tiny bit of feedback it gets it for every possible move every turn, the network weights are updated based on how well it predicts its own behavior. There's growing evidence that animal brains do something similar, this is called the predictive processing model of cognition. Anyway, I want to point out that this very much seems like a theory of mind, except it's a theory not of another mind but if its own. BTW, AlphaZero becomes, after training, ridiculously good not only at predicting its own behavior but at predicting the value of a move. The go playing version can beat all but the very best professional players without doing any tree search whatsoever, in other words making moves using only a single pass along the NN part of the architecture (the intuition) and not looking even one move ahead, likewise it is remarkably accurate, though not perfectly so, at predicting its final decision after searching the game tree, so its conception of self is accurate.

Now there's another game playing engine called Maia, this is designed not to beat humans but to play like they do, and it's quite good at this. It can imitate play of very good amateurs all the way up to professionals. There's absolutely no reason this couldn't be integrated into the AlphaZero algorithm, providing it with not only a theory of its own mind but that of a (generic) human player. And if you don't like that generic part, there are engines fine tuned on single humans, usually professional players with a lot of games in the database. So basically, yes they are stimulus react models, always they will be, but they're complicated ones where the majority of the stimulus is generated internally, and probably so are humans. And they are capable even today of having a theory of mind by any reasonable definition of what that means.

InfuriatinglyOpaque t1_ivb9otw wrote on November 6, 2022 at 6:26 PM

In terms of papers using relevant tasks, the closest example I can think of might be the "hide and seek" paradigm used by Weihs et al. (2019), which I include in my list below (not my area of expertise - so there could easily be far more relevant papers out there that I'm not aware of). I wouldn't be the least bit surprised though if there were lots of relevant ideas that you could take from prior works using symmetric games as well, so I've also included a wide variety of other papers that all fall under the broad umbrella of modeling game-learning/game-behavior.

References

Aggarwal, P., & Dutt, V. (2020). Role of information about opponent’s actions and intrusion-detection alerts on cyber-decisions in cybersecurity games. Cyber Security: https://www.ingentaconnect.com/content/hsp/jcs/2020/00000003/00000004/art00008

Amiranashvili, A., Dorka, N., Burgard, W., Koltun, V., & Brox, T. (2020). Scaling Imitation Learning in Minecraft. http://arxiv.org/abs/2007.02701

Bramlage, L., & Cortese, A. (2021). Generalized Attention-Weighted Reinforcement Learning. Neural Networks. https://doi.org/10.1016/j.neunet.2021.09.023

Frey, S., & Goldstone, R. L. (2013). Cyclic Game Dynamics Driven by Iterated Reasoning. PLoS ONE, 8(2), e56416. https://doi.org/10.1371/journal.pone.0056416

Guennouni, I., & Speekenbrink, M. (n.d.). Transfer of Learned Opponent Models in Zero Sum Games. 8. Hawkins, R. D., Frank, M. C., & Goodman, N. D. (2020). Characterizing the dynamics of learning in repeated reference games. Cognitive Science, 44(6), e12845. http://arxiv.org/abs/1912.07199

Kumaran, V., Mott, B. W., & Lester, J. C. (2019.). Generating Game Levels for Multiple Distinct Games with a Common Latent Space. 7. https://ojs.aaai.org/index.php/AIIDE/article/view/7418

Lampinen, A. K., & McClelland, J. L. (2020). Transforming task representations to perform novel tasks. Proceedings of the National Academy of Sciences, 117(52), 32970–32981. https://doi.org/10.1073/pnas.2008852117

Lensberg, T., & Schenk-Hoppé, K. R. (2021). Cold play: Learning across bimatrix games. Journal of Economic Behavior & Organization, 185, 419–441. https://doi.org/10.1016/j.jebo.2021.02.027

Schwarzer, M., Rajkumar, N., Noukhovitch, M., Anand, A., Charlin, L., Hjelm, D., Bachman, P., & Courville, A. (2021). Pretraining Representations for Data-Efficient Reinforcement Learning. http://arxiv.org/abs/2106.04799

Sibert, C., Gray, W. D., & Lindstedt, J. K. (2017). Interrogating Feature Learning Models to Discover Insights Into the Development of Human Expertise in a Real-Time, Dynamic Decision-Making Task. Topics in Cognitive Science, 9(2), 374–394. https://doi.org/10.1111/tops.12225

Spiliopoulos, L. (2013). Beyond fictitious play beliefs: Incorporating pattern recognition and similarity matching. Games and Economic Behavior, 81, 69–85. https://doi.org/10.1016/j.geb.2013.04.005

Spiliopoulos, L. (2015). Transfer of conflict and cooperation from experienced games to new games: A connectionist model of learning. Frontiers in Neuroscience, 9. https://doi.org/10.3389/fnins.2015.00102

Stanić, A., Tang, Y., Ha, D., & Schmidhuber, J. (2022). Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter (No. arXiv:2208.03374). arXiv. http://arxiv.org/abs/2208.03374

Tsividis, P. A., Loula, J., Burga, J., Foss, N., Campero, A., Pouncy, T., Gershman, S. J., & Tenenbaum, J. B. (2021). Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning. http://arxiv.org/abs/2107.12544

Weihs, L., Kembhavi, A., Han, W., Herrasti, A., Kolve, E., Schwenk, D., Mottaghi, R., & Farhadi, A. (2019). Artificial Agents Learn Flexible Visual Representations by Playing a Hiding Game. http://arxiv.org/abs/1912.08195

Zheng, Z. (Sam)., Lin, X. (Daisy)., Topping, J., & Ma, W. J. (2022). Comparing Machine and Human Learning in a Planning Task of Intermediate Complexity. Proceedings of the Annual Meeting of the Cognitive Science Society, 44(44). https://escholarship.org/uc/item/8wm748d8

computing_professor OP t1_ivbanwf wrote on November 6, 2022 at 6:32 PM

Yes, I have seen the hide and seek results and I didn't even consider it! That's a great example.

sckuzzle t1_ivbjs9d wrote on November 6, 2022 at 7:30 PM

If I were to approach this, I'd train them at the same time. You have two models - one for each side - each with their own reward functions. Then you'd train them in parallel, playing against each other as they go.

It's a bit of a challenge because you can only train them relative to the strength of the other - so you need them both to get "smarter" in order to continue their training. But that's no different than a model that self-trains against itself.

computing_professor OP t1_ivbxeyc wrote on November 6, 2022 at 8:55 PM

Yes, this is what made me think of how GANs are trained. I'll see what's out there so I don't have to reinvent the wheel, and tweak as needed!