thiru_2718 t1_j83kkbh wrote on February 11, 2023 at 11:10 AM

Poker depends on looking far enough ahead to be able to play game theory optimal (GTO) moves that maximize the expected value over a long run of hands. You can train a transformer on a ton of data, and get it to predict context-specific plays, but if the number of possible decision-branches is growing exponentially, is this enough?

But honestly, I don't know much about these types of RL-type problems. How is AlphaGo structured?

[deleted] t1_j84odzp wrote on February 11, 2023 at 4:33 PM

[removed]

lmtog OP t1_j84vc3x wrote on February 11, 2023 at 5:21 PM

Thats what I'am not quite sure about. I assume the result would not be close to the nash equilibrium.

But I don't know since I have not worked with transformers before.

I think it comes down to, can we train a transformer with feedback on what hands were good and which ones were not. Looking at other responses it seems like that is not possible.