dualmindblade t1_iva5ncr wrote on November 6, 2022 at 1:54 PM

Disclaimer: not even close to an expert, I just keep up with the state of the field

If you were using something the AlphaZero algorithm, I'm fairly certain the asymmetry is not an issue, it would work unmodified, and also I don't think you'd want to use two models, it would weaken the play. Argument is that the NN part is trying to intuit the properties of a big tree search in which both players are participants, so it must understand both strategies about equally regardless of which side it's playing. It's no different from a human player, when you make a move the next step is to consider the resulting position from the other side and evaluate their potential moves. BTW chess is not super symmetric in practice, usually black will need to adopt a defensive strategy in the opening.

computing_professor OP t1_iva7s6f wrote on November 6, 2022 at 2:10 PM

Cool, thanks for the reply. With chess, I always assumed it was just examining the state as a pair (board,turn), regardless of who went first. I study the mathematics of combinatorial games and it's rare to ever consider who moves first, as it's almost always more interesting to determine the best move for any given game state.

Do you have any reading suggestions for understanding AlphaZero? I've read surface level/popular articles, but I'm a mathematician and would like to dig deeper into it. And, of course, learn how to apply it in my case.

dualmindblade t1_iva9p3g wrote on November 6, 2022 at 2:24 PM

I would suggest reading the original alphago paper, it's extremely digestible, then skim the AlphaZero one, less detail there because it's a very similar architecture and actually it is simpler than the original. Think of AlphaZero as a scheme for improving the loss function, the actual architecture of the NN part is sort of unimportant, you can think of it as a black box, or maybe a black box with two smaller boxes sticking out of it.

computing_professor OP t1_ivaak87 wrote on November 6, 2022 at 2:30 PM

Thanks!

sckuzzle t1_ivbiskk wrote on November 6, 2022 at 7:23 PM

> so it must understand both strategies about equally regardless of which side it's playing

What do you mean here by "understand"? My understanding is that the state-of-the-art AI has no concept of what the capabilities of its opponent are or even what its opponent might be thinking; it only understands how to react in order to maximize a score.

So while you could train it to react well no matter which side it is playing, how would it benefit from being able to play the other side better? It would need to spin up a duplicate of itself to play the other side and then analyze itself to understand what is happening, but then it would just get into an infinite loop as it's duplicate self spins up its own duplicate.

I guess what I'm getting at is that these AI algorithms have no theory of mind. They are simple stimulus-react models. Even the concept of an opposing player is beyond them - it'd be the same whether it was playing solitaire or chess.

dualmindblade t1_ivbs4ra wrote on November 6, 2022 at 8:23 PM

By the other side, I meant the other side of the board, but let's explore your ideas a bit in the context of board game algorithms. In the case of the AlphaZero algorithm, the other opponent is itself. The neural network part of alpha zero acts as a sort of intuition engine, and it's trying to intuit 2 related but actually different things, 1 the value of a particular move, how good or bad it is, 2 which move AlphaZero itself will be likely to choose after it has thought about it for a long time. By thinking, I mean running many many simulated games from the current position, making random moves probabilistically weighted by intuition 1. This is the novel idea of the algorithm, and it allows it to drastically magnify the amount of data used to train the neural network. Instead of having to play an entire game to get 1 tiny bit of feedback it gets it for every possible move every turn, the network weights are updated based on how well it predicts its own behavior. There's growing evidence that animal brains do something similar, this is called the predictive processing model of cognition. Anyway, I want to point out that this very much seems like a theory of mind, except it's a theory not of another mind but if its own. BTW, AlphaZero becomes, after training, ridiculously good not only at predicting its own behavior but at predicting the value of a move. The go playing version can beat all but the very best professional players without doing any tree search whatsoever, in other words making moves using only a single pass along the NN part of the architecture (the intuition) and not looking even one move ahead, likewise it is remarkably accurate, though not perfectly so, at predicting its final decision after searching the game tree, so its conception of self is accurate.

Now there's another game playing engine called Maia, this is designed not to beat humans but to play like they do, and it's quite good at this. It can imitate play of very good amateurs all the way up to professionals. There's absolutely no reason this couldn't be integrated into the AlphaZero algorithm, providing it with not only a theory of its own mind but that of a (generic) human player. And if you don't like that generic part, there are engines fine tuned on single humans, usually professional players with a lot of games in the database. So basically, yes they are stimulus react models, always they will be, but they're complicated ones where the majority of the stimulus is generated internally, and probably so are humans. And they are capable even today of having a theory of mind by any reasonable definition of what that means.