Submitted by lmtog t3_10zix8k in MachineLearning
thiru_2718 t1_j83kkbh wrote
Poker depends on looking far enough ahead to be able to play game theory optimal (GTO) moves that maximize the expected value over a long run of hands. You can train a transformer on a ton of data, and get it to predict context-specific plays, but if the number of possible decision-branches is growing exponentially, is this enough?
But honestly, I don't know much about these types of RL-type problems. How is AlphaGo structured?
[deleted] t1_j84odzp wrote
[removed]
lmtog OP t1_j84vc3x wrote
Thats what I'am not quite sure about. I assume the result would not be close to the nash equilibrium.
But I don't know since I have not worked with transformers before.
I think it comes down to, can we train a transformer with feedback on what hands were good and which ones were not. Looking at other responses it seems like that is not possible.
Viewing a single comment thread. View all comments