Viewing a single comment thread. View all comments

sckuzzle t1_ivbjs9d wrote

If I were to approach this, I'd train them at the same time. You have two models - one for each side - each with their own reward functions. Then you'd train them in parallel, playing against each other as they go.

It's a bit of a challenge because you can only train them relative to the strength of the other - so you need them both to get "smarter" in order to continue their training. But that's no different than a model that self-trains against itself.

2

computing_professor OP t1_ivbxeyc wrote

Yes, this is what made me think of how GANs are trained. I'll see what's out there so I don't have to reinvent the wheel, and tweak as needed!

1