smallest_meta_review
smallest_meta_review OP t1_ivhz0g2 wrote
Reply to comment by Nameless1995 in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
Interesting. So self-distillation is using the same capacity model as student and teacher -- are there papers which significantly increase model capacity? I thought the main use of distillation in SL was reducing inference time but would be interested to know of cases where we actually use a much bigger student model.
smallest_meta_review OP t1_ivcghme wrote
Reply to comment by luchins in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
Oh, so one of the examples in the blog post is that we start with a DQN agent with a 3-layer CNN architecture and reincarnate another Rainbow agent with a ResNet architecture (Impala-CNN) using the QDagger approach for reincarnation. Once reincarnated, the ResNet Rainbow agent is further trained with RL to maximize reward. See the paper here for more details: https://openreview.net/forum?id=t3X5yMI_4G2
smallest_meta_review OP t1_ivcf2tb wrote
Reply to comment by _der_erlkonig_ in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
While the critique is fair, if the alternative is always train agents from scratch, then reincarnating RL seems like a more reasonable alternative. Furthermore, dependence on prior computation doesn't stop NLP / vision researchers from reusing prior computation (pretrained models), so it seems worthwhile to do so in RL research too.
Re role of distillation distillation, the paper combines online distillation (Dagger) + RL to increase model capacity (rather than decrease capacity akin to SL) and wean off the distillation loss over time for training the agent only with RL loss .. the paper calls it a simple baseline. Also, it's unclear what's the best way to reuse prior computation given in a form other than learned agents, which is what the paper argues to study.
Re source of gains, if the aim is to benchmark RL methods in an RRL context, all methods would use the exact same prior computation and same reincarnating RL method for fair comparison. In this setup, it's likely that the supervised learning losses (if used) would add stability to the RL training process.
smallest_meta_review OP t1_ivanqcm wrote
Reply to comment by pm_me_your_pay_slips in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
Haha, if you have tons of compute and several lifetimes to wait for tabula rasa RL to solve real problems :)
smallest_meta_review OP t1_ivancqx wrote
Reply to comment by anonymousTestPoster in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
Good question. I feel it's going one step further and saying why not reuse prior computational work (e.g., existing learned agents) in the same problem especially if that problem is computationally demanding (large scale RL papers do this but research papers don't). So, next time we train a new RL agent, we reuse prior computation rather than starting from scratch (e.g., we train new agents on Atari games given a pretrained DQN agent from 2015).
Also, in reincarnating RL, we don't have to stick to the same pretrained network architecture and can possibly try some other architecture too.
smallest_meta_review OP t1_ivam34g wrote
Reply to comment by smurfpiss in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
Yeah, or even across different classes of RL methods: reusing a policy for training a value-based RL (e.g, DQN) or model-based RL method.
smallest_meta_review OP t1_ivaghqa wrote
Reply to comment by smurfpiss in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
Good question. The original blog post somewhat covers this:
> Imagine a researcher who has trained an agent A_1 for some time, but now wants to experiment with better architectures or algorithms. While the tabula rasa workflow requires retraining another agent from scratch, Reincarnating RL provides the more viable option of transferring the existing agent A1 to a different agent and training this agent further, or simply fine-tuning A_1.
But this is not what happens in research. For example, each time we are training a new agent to let say play an Atari game, we train it from scratch ignoring all the prior agents trained on that game. This work argues that why not reuse learned knowledge from the existing agent while training new agents (which may be totally different).
smallest_meta_review OP t1_iva4dj7 wrote
Reply to comment by [deleted] in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
> Tabula rasa RL vs. Reincarnating RL (RRL). While tabula rasa RL focuses on learning from scratch, RRL is based on the premise of reusing prior computational work (e.g., prior learned agents) when training new agents or improving existing agents, even in the same environment. In RRL, new agents need not be trained from scratch, except for initial forays into new problems.
More at https://ai.googleblog.com/2022/11/beyond-tabula-rasa-reincarnating.html?m=1
smallest_meta_review OP t1_iva2n3z wrote
Reply to comment by BobDope in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
LOL. This is what I clarify before I talk about this. Here it's in the context of reincarnating an existing RL agent to a new agent (possibly with a different architecture and algorithm).
smallest_meta_review OP t1_iva27vt wrote
Reply to comment by TiredOldCrow in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
While nurture + nature seems useful across lifetimes, reincarnation might be how we learn during our lifetimes? I am not an expert but I found this comment interesting:
> This must be a fundamental part of how primates like us learn, piggybacking off of an existing policy at some level, so I'm all for RL research that tries to formalize ways it can work computationally.
smallest_meta_review OP t1_iva1nr2 wrote
Reply to comment by essahjott in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
https://agarwl.github.io/reincarnating_rl for paper, code, blog post and trained agents.
smallest_meta_review OP t1_ivjle6n wrote
Reply to comment by Nameless1995 in [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
Thanks for your informative reply. If interested, we have previously applied results from self-distillation to show that implicit regularization can actually lead to capacity loss in RL as bootstrapping can be viewed as self-distillation: https://drive.google.com/file/d/1vFs1FDS-h8HQ1J1rUKCgpbDlKTCZMap-/view?usp=drivesdk