Submitted by randomkolmogorov t3_zf25ue in MachineLearning

Hello!

I've been working on a project that uses the Natural Gradient and I was wondering if anyone has suggestions on ways to include higher-order information. Is there some equivalent to the Hessian for natural gradients? If not, is there some sort of way of finding a trust region where the Natural Gradient approximation is reasonable?

Thank you!

5

Comments

You must log in or register to comment.

UnusualClimberBear t1_iza00fp wrote

TRPO is often too slow for applications because of that line search and researchers often prefer to use PPO, which also has some guarantees in terms of KL on the state distribution and is faster. I'd be curious to hear about your problem if it ends up that TRPO is the best choice.

1

Ulfgardleo t1_iza5481 wrote

the hessian is always better information than the natural gradient because it includes actual information of the curvature of the function while the NG only includes curvature of the model. So any second order TR approach with NG information will approach the hessian.

//edit: I am assuming actual trust region methods, like TR-Newton, and not some RL-ML approximation schemes.

4

randomkolmogorov OP t1_iza7hwl wrote

I am not really doing RL but rather aleatoric uncertainty quantification where I need to optimize over a manifold of functions. My distributions are much more manageable than if I were doing policy gradient so I have a feeling that with some cleverness it might be possible to sidestep a lot of the complications in TRPO but use the same ideas in the paper.

4

UnusualClimberBear t1_izaa5xr wrote

For RL you also need to account for the uncertainty from the states actions you almost ignored during data collection but you'd like to use more. Gradient on a policy has different behavior than a gradient on a supervised loss.

5