randomkolmogorov

randomkolmogorov OP t1_iza7hwl wrote

I am not really doing RL but rather aleatoric uncertainty quantification where I need to optimize over a manifold of functions. My distributions are much more manageable than if I were doing policy gradient so I have a feeling that with some cleverness it might be possible to sidestep a lot of the complications in TRPO but use the same ideas in the paper.

4