I am not really doing RL but rather aleatoric uncertainty quantification where I need to optimize over a manifold of functions. My distributions are much more manageable than if I were doing policy gradient so I have a feeling that with some cleverness it might be possible to sidestep a lot of the complications in TRPO but use the same ideas in the paper.
Thank you, this talk is very helpful. I was thinking about the formulation in terms of the natural gradient but adapting the approach in TRPO to my case seems like a good idea.
randomkolmogorov OP t1_iza7hwl wrote
Reply to comment by UnusualClimberBear in [Discussion] Suggestions on Trust Region Methods For Natural Gradient by randomkolmogorov
I am not really doing RL but rather aleatoric uncertainty quantification where I need to optimize over a manifold of functions. My distributions are much more manageable than if I were doing policy gradient so I have a feeling that with some cleverness it might be possible to sidestep a lot of the complications in TRPO but use the same ideas in the paper.