Viewing a single comment thread. View all comments

UnusualClimberBear t1_izaa5xr wrote on December 7, 2022 at 5:12 PM

Reply to comment by Ulfgardleo in [Discussion] Suggestions on Trust Region Methods For Natural Gradient by randomkolmogorov

For RL you also need to account for the uncertainty from the states actions you almost ignored during data collection but you'd like to use more. Gradient on a policy has different behavior than a gradient on a supervised loss.