Oripy

Oripy t1_j8m8ejv wrote

I have a question related to the Actor Critic method described in the keras example here: https://keras.io/examples/rl/actor_critic_cartpole/

I looked at the code for the Train part, and I think I understand what all lines are supposed to do and why they are there. However, I don't think I understand what role the critic plays in the improvement of the agent. To me this critic is just a value that predicts the future reward, but I don't see this being fed back into the system for the agent to select a better action to improve its reward.

Do I have a good understanding? Is the critic just a "bonus" output? Are the two unrelated and the exact same performance could be achieved by removing the Critic output altogether? Or is the critic output used in any way to improve learning rate in a way I fail to see?

Thank you.

1