Submitted by randomkolmogorov t3_zf25ue in MachineLearning
Hello!
I've been working on a project that uses the Natural Gradient and I was wondering if anyone has suggestions on ways to include higher-order information. Is there some equivalent to the Hessian for natural gradients? If not, is there some sort of way of finding a trust region where the Natural Gradient approximation is reasonable?
Thank you!
UnusualClimberBear t1_iz9ohx8 wrote
TRPO follows the same direction as NPG with a maximal step size to still satisfy the quadratic approximation of the KL constraint. I'm not sure of what you would like to to better.
Nicolas Leroux gave a nice talk on RL seen as an optimization problem: https://slideslive.com/38935818/policy-optimization-in-reinforcement-learning-rl-as-blackbox-optimization