Submitted by lmtog t3_10zix8k in MachineLearning
Looking at the current research it seems like Monte Carlo CFR is the defacto standard (Pluribus).
But are transformers able to be trained on poker as well?
Lets say we encode hands into something like 5h (5 of hearts) and also pass along info of the current game state like p1:raise:2bb, p2:fold and p3:call:2bb. Would the Model be able to predict what hands I should be playing? Lets say we train the model by playing against itself and feed back the result to train the model this way.
This is just an idea and I have not dove into transformers too much so there might be something that I'am missing.
What are your thoughts on this?
thiru_2718 t1_j83kkbh wrote
Poker depends on looking far enough ahead to be able to play game theory optimal (GTO) moves that maximize the expected value over a long run of hands. You can train a transformer on a ton of data, and get it to predict context-specific plays, but if the number of possible decision-branches is growing exponentially, is this enough?
But honestly, I don't know much about these types of RL-type problems. How is AlphaGo structured?