Submitted by Linear-- t3_11bcklh in MachineLearning
currentscurrents t1_ja5n5xi wrote
Reply to comment by AmalgamDragon in [D] Isn't self-supervised learning(SSL) simply a kind of SL? by Linear--
Those are all internal rewards, which your brain creates because it knows (according to the world model) that these events lead to real rewards. It can only do this because it has learned to predict the future.
>PPO can handle this quite well.
"Quite well" is still trying random actions millions of times. World modeling allows you to learn from two orders of magnitude less data.
Viewing a single comment thread. View all comments