currentscurrents t1_ja5n5xi wrote on February 27, 2023 at 12:44 AM

Reply to comment by AmalgamDragon in [D] Isn't self-supervised learning(SSL) simply a kind of SL? by Linear--

Those are all internal rewards, which your brain creates because it knows (according to the world model) that these events lead to real rewards. It can only do this because it has learned to predict the future.

>PPO can handle this quite well.

"Quite well" is still trying random actions millions of times. World modeling allows you to learn from two orders of magnitude less data.