Submitted by randomkolmogorov t3_zf25ue in MachineLearning
UnusualClimberBear t1_izaa5xr wrote
Reply to comment by Ulfgardleo in [Discussion] Suggestions on Trust Region Methods For Natural Gradient by randomkolmogorov
For RL you also need to account for the uncertainty from the states actions you almost ignored during data collection but you'd like to use more. Gradient on a policy has different behavior than a gradient on a supervised loss.
Viewing a single comment thread. View all comments