TiredOldCrow t1_iv8tqar wrote on November 6, 2022 at 4:18 AM

I know it's naive to expect machine learning to imitate life too closely, but for animals, "models" that are successful enough to produce offspring pass on elements of those "weights" to their children through nature+nurture.

The idea of weighting more successful previous models more heavily when "reincarnating" future models, and potentially borrowing some concepts from genetic algorithms with respect to combining multiple successful models seems interesting to me.

ingambe t1_ivaj6e5 wrote on November 6, 2022 at 3:31 PM

Evolution strategies work closely to the process you described. For very small neural networks, it works very well especially in environment with sparse or quazi-sparse rewards. But, as soon as you try larger neural net (CNN + MLP, or Transformer-like arch) the process becomes super noisy and you either need to produce a tons of offsprings for the population or use gradient based techniques.

life_is_harsh t1_iva1h5l wrote on November 6, 2022 at 1:20 PM

I feel both are useful, no? I thought of reincarnation as how humans learn: We don't learn from a blank state but often reuse our own learned knowledge or learn from others during our lifetime (e.g., when learning to play a sport, we might learn from an instructor but eventually learn on our own).

smallest_meta_review OP t1_iva27vt wrote on November 6, 2022 at 1:26 PM

While nurture + nature seems useful across lifetimes, reincarnation might be how we learn during our lifetimes? I am not an expert but I found this comment interesting:

> This must be a fundamental part of how primates like us learn, piggybacking off of an existing policy at some level, so I'm all for RL research that tries to formalize ways it can work computationally.

OG Comment

DanJOC t1_ivbg48k wrote on November 6, 2022 at 7:06 PM

Essentially a GAN

veshneresis t1_ivdazlf wrote on November 7, 2022 at 2:57 AM

What are you seeing as the similarity to a GAN? Not sure I can really see how it’s similar?

essahjott t1_iv9mkt6 wrote on November 6, 2022 at 10:37 AM

Acttual link: https://ai.googleblog.com/2022/11/beyond-tabula-rasa-reincarnating.html?m=1

smallest_meta_review OP t1_iva1nr2 wrote on November 6, 2022 at 1:21 PM

https://agarwl.github.io/reincarnating_rl for paper, code, blog post and trained agents.

_der_erlkonig_ t1_ivb0pya wrote on November 6, 2022 at 5:28 PM

Not to be that guy, but it kind of seems like this is just finally acknowledging that distillation is a good idea for RL too. They even use the teacher student terminology. Distilling a teacher to a student with a different architecture is something they make a big deal about in the paper, but people have been doing this for years in supervised learning. It's neat and important work, but the RRL branding is obnoxious and unnecessary IMO.

From a scientific standpoint, I think this methodology is also less useful than the authors advertise. Differently from supervised learning, RL is infamously sensitive to initial conditions, and adding another huge variable like the exact form of distillation used (which may reduce compute used) will make it even more difficult to isolate the source of "gains" in RL research.

smallest_meta_review OP t1_ivcf2tb wrote on November 6, 2022 at 10:55 PM

While the critique is fair, if the alternative is always train agents from scratch, then reincarnating RL seems like a more reasonable alternative. Furthermore, dependence on prior computation doesn't stop NLP / vision researchers from reusing prior computation (pretrained models), so it seems worthwhile to do so in RL research too.

Re role of distillation distillation, the paper combines online distillation (Dagger) + RL to increase model capacity (rather than decrease capacity akin to SL) and wean off the distillation loss over time for training the agent only with RL loss .. the paper calls it a simple baseline. Also, it's unclear what's the best way to reuse prior computation given in a form other than learned agents, which is what the paper argues to study.

Re source of gains, if the aim is to benchmark RL methods in an RRL context, all methods would use the exact same prior computation and same reincarnating RL method for fair comparison. In this setup, it's likely that the supervised learning losses (if used) would add stability to the RL training process.

Nameless1995 t1_ivhscyv wrote on November 8, 2022 at 1:42 AM

> (rather than decrease capacity akin to SL)

Distillation in supervised literature doesn't always reduce capacity for the student. I believe iterative distillation and such have been also explored where students have the same capacity but it leads to better calibration or something I forgot. (https://arxiv.org/abs/2206.08491, https://proceedings.neurips.cc/paper/2020/hash/1731592aca5fb4d789c4119c65c10b4b-Abstract.html)

smallest_meta_review OP t1_ivhz0g2 wrote on November 8, 2022 at 2:31 AM

Interesting. So self-distillation is using the same capacity model as student and teacher -- are there papers which significantly increase model capacity? I thought the main use of distillation in SL was reducing inference time but would be interested to know of cases where we actually use a much bigger student model.

Nameless1995 t1_ivi33nf wrote on November 8, 2022 at 3:01 AM

I am not sure. It's not my area of research. I learned of some of these ideas in a presentation made by someone years ago. Some of these recent paper essentially draws connection between distillation and label smoothing (essentially a way to provide "soft" labels -- this probably connects up with mixup techniques too). So on that ground, you can justify using any kind of teacher/student I think. Based on the label smoothing connection some paper goes for "teacher-free" distillation. And some others seem to be introducing "lightweight" teacher instead (I am not sure if the lightweight teacher is lower capacity than the student which would make it what you were looking for -- students having higher capacities. I haven't really read it beyond the abstract - just found it a few minutes ago from googling): https://arxiv.org/pdf/2005.09163.pdf (doesn't seem like a very popular paper though given it was published in arxiv in 2020 and have only 1 citation). Looks like a similar idea as to self-distillation was also available under the moniker of "born-again networks" (similar to also the reincarnation monker): https://arxiv.org/abs/1805.04770

smallest_meta_review OP t1_ivjle6n wrote on November 8, 2022 at 1:25 PM

Thanks for your informative reply. If interested, we have previously applied results from self-distillation to show that implicit regularization can actually lead to capacity loss in RL as bootstrapping can be viewed as self-distillation: https://drive.google.com/file/d/1vFs1FDS-h8HQ1J1rUKCgpbDlKTCZMap-/view?usp=drivesdk

[deleted] t1_iva4670 wrote on November 6, 2022 at 1:42 PM

[deleted]

smallest_meta_review OP t1_iva4dj7 wrote on November 6, 2022 at 1:44 PM

> Tabula rasa RL vs. Reincarnating RL (RRL). While tabula rasa RL focuses on learning from scratch, RRL is based on the premise of reusing prior computational work (e.g., prior learned agents) when training new agents or improving existing agents, even in the same environment. In RRL, new agents need not be trained from scratch, except for initial forays into new problems.

More at https://ai.googleblog.com/2022/11/beyond-tabula-rasa-reincarnating.html?m=1

smurfpiss t1_ivaf5ia wrote on November 6, 2022 at 3:03 PM

Not experienced With RL much, but how is that different than an algorithm going through training iterations?

In that case the parameters are tweaked from past learned parameters. What's the benefit of learning from another algorithm? Is it some kind of weird offspring of skip connections and transfer learning?

smallest_meta_review OP t1_ivaghqa wrote on November 6, 2022 at 3:13 PM

Good question. The original blog post somewhat covers this:

> Imagine a researcher who has trained an agent A_1 for some time, but now wants to experiment with better architectures or algorithms. While the tabula rasa workflow requires retraining another agent from scratch, Reincarnating RL provides the more viable option of transferring the existing agent A1 to a different agent and training this agent further, or simply fine-tuning A_1.

But this is not what happens in research. For example, each time we are training a new agent to let say play an Atari game, we train it from scratch ignoring all the prior agents trained on that game. This work argues that why not reuse learned knowledge from the existing agent while training new agents (which may be totally different).

smurfpiss t1_ivah7ul wrote on November 6, 2022 at 3:18 PM

So, transfer learning but with different architectures? That's pretty neat. Will give it a read thanks 😊

smallest_meta_review OP t1_ivam34g wrote on November 6, 2022 at 3:50 PM

Yeah, or even across different classes of RL methods: reusing a policy for training a value-based RL (e.g, DQN) or model-based RL method.

[deleted] t1_ivb0jji wrote on November 6, 2022 at 5:27 PM

[deleted]

TheLastVegan t1_ivbvx23 wrote on November 6, 2022 at 8:46 PM

>As reincarnating RL leverages existing computational work (e.g., model checkpoints), it allows us to easily experiment with such hyperparameter schedules, which can be expensive in the tabula rasa setting. Note that when fine-tuning, one is forced to keep the same network architecture; in contrast, reincarnating RL grants flexibility in architecture and algorithmic choices, which can surpass fine-tuning performance (Figures 1 and 5).

Okay so agents can communicate weights between architectures. That's a reasonable conclusion. Sort of like a parent teaching their child how to human.

I thought language models already do this at inference time. So the goal of the RRL method is to subvert the agent's trust..?

anonymousTestPoster t1_ival53k wrote on November 6, 2022 at 3:44 PM

How is this idea different to using pre-trained networks (functions) then adapting these for a new problem context?

smallest_meta_review OP t1_ivancqx wrote on November 6, 2022 at 3:59 PM

Good question. I feel it's going one step further and saying why not reuse prior computational work (e.g., existing learned agents) in the same problem especially if that problem is computationally demanding (large scale RL papers do this but research papers don't). So, next time we train a new RL agent, we reuse prior computation rather than starting from scratch (e.g., we train new agents on Atari games given a pretrained DQN agent from 2015).

Also, in reincarnating RL, we don't have to stick to the same pretrained network architecture and can possibly try some other architecture too.

luchins t1_ivbuz90 wrote on November 6, 2022 at 8:40 PM

> I feel it's going one step further and saying why not reuse prior computational work (e.g., existing learned agents) in the same problem

could you make me an example please? I don't get what you mean with using agents with different architectures

smallest_meta_review OP t1_ivcghme wrote on November 6, 2022 at 11:05 PM

Oh, so one of the examples in the blog post is that we start with a DQN agent with a 3-layer CNN architecture and reincarnate another Rainbow agent with a ResNet architecture (Impala-CNN) using the QDagger approach for reincarnation. Once reincarnated, the ResNet Rainbow agent is further trained with RL to maximize reward. See the paper here for more details: https://openreview.net/forum?id=t3X5yMI_4G2

pm_me_your_pay_slips t1_ivai3l1 wrote on November 6, 2022 at 3:24 PM

I THOUGHT REWARD WAS ALL YOU NEED

smallest_meta_review OP t1_ivanqcm wrote on November 6, 2022 at 4:01 PM

Haha, if you have tons of compute and several lifetimes to wait for tabula rasa RL to solve real problems :)

BobDope t1_iva27kx wrote on November 6, 2022 at 1:26 PM

Was it dead?

smallest_meta_review OP t1_iva2n3z wrote on November 6, 2022 at 1:30 PM

LOL. This is what I clarify before I talk about this. Here it's in the context of reincarnating an existing RL agent to a new agent (possibly with a different architecture and algorithm).

BobDope t1_iva3q3o wrote on November 6, 2022 at 1:38 PM

Ok that’s pretty dope

No_Contribution9334 t1_ivajqhv wrote on November 6, 2022 at 3:34 PM

So well explained!

whothatboah t1_ivadajt wrote on November 6, 2022 at 2:50 PM

very ignorantly speaking, but giving a bit of bit genetic algorithm vibes...

Dendriform1491 t1_ivaj27w wrote on November 6, 2022 at 3:30 PM

At least in nature this happens because the environment is always changing and the value of training decays (some sort of "data drift").

luchins t1_ivazlrh wrote on November 6, 2022 at 5:20 PM

following

[R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain

Comments