dualmindblade t1_ivbs4ra wrote on November 6, 2022 at 8:23 PM

Reply to comment by sckuzzle in Training a board game player AI for an asymmetric game by computing_professor

By the other side, I meant the other side of the board, but let's explore your ideas a bit in the context of board game algorithms. In the case of the AlphaZero algorithm, the other opponent is itself. The neural network part of alpha zero acts as a sort of intuition engine, and it's trying to intuit 2 related but actually different things, 1 the value of a particular move, how good or bad it is, 2 which move AlphaZero itself will be likely to choose after it has thought about it for a long time. By thinking, I mean running many many simulated games from the current position, making random moves probabilistically weighted by intuition 1. This is the novel idea of the algorithm, and it allows it to drastically magnify the amount of data used to train the neural network. Instead of having to play an entire game to get 1 tiny bit of feedback it gets it for every possible move every turn, the network weights are updated based on how well it predicts its own behavior. There's growing evidence that animal brains do something similar, this is called the predictive processing model of cognition. Anyway, I want to point out that this very much seems like a theory of mind, except it's a theory not of another mind but if its own. BTW, AlphaZero becomes, after training, ridiculously good not only at predicting its own behavior but at predicting the value of a move. The go playing version can beat all but the very best professional players without doing any tree search whatsoever, in other words making moves using only a single pass along the NN part of the architecture (the intuition) and not looking even one move ahead, likewise it is remarkably accurate, though not perfectly so, at predicting its final decision after searching the game tree, so its conception of self is accurate.

Now there's another game playing engine called Maia, this is designed not to beat humans but to play like they do, and it's quite good at this. It can imitate play of very good amateurs all the way up to professionals. There's absolutely no reason this couldn't be integrated into the AlphaZero algorithm, providing it with not only a theory of its own mind but that of a (generic) human player. And if you don't like that generic part, there are engines fine tuned on single humans, usually professional players with a lot of games in the database. So basically, yes they are stimulus react models, always they will be, but they're complicated ones where the majority of the stimulus is generated internally, and probably so are humans. And they are capable even today of having a theory of mind by any reasonable definition of what that means.