sckuzzle t1_ivbjs9d wrote on November 6, 2022 at 7:30 PM

If I were to approach this, I'd train them at the same time. You have two models - one for each side - each with their own reward functions. Then you'd train them in parallel, playing against each other as they go.

It's a bit of a challenge because you can only train them relative to the strength of the other - so you need them both to get "smarter" in order to continue their training. But that's no different than a model that self-trains against itself.

computing_professor OP t1_ivbxeyc wrote on November 6, 2022 at 8:55 PM

Yes, this is what made me think of how GANs are trained. I'll see what's out there so I don't have to reinvent the wheel, and tweak as needed!