nicku_a

nicku_a OP t1_jdllrfv wrote

Hey! Yes, there are similarities to PBT, but there are a few differences here. Firstly, the mutations implemented with AgileRL are much more dynamic. Rather than only mutating hyperparameters, we’re allowing any part of the algorithm/model to mutate - HPs, network architecture (layers and nodes), activation functions and network weights themselves. We also train the population in one go, and offer efficient learning by sharing experience within the population.

1

nicku_a OP t1_jdkdxy8 wrote

Good question! So what we’re doing here is not specifically applying evolutionary algorithms instead of RL. We’re applying evolutionary algorithms as a method of HPO, while still using RL to learn and it’s advantages. Take a look at my other comments explaining how this works, and check out the docs for more information.

1

nicku_a OP t1_jdhra2m wrote

Sure! Traditionally, hyperparameter optimization (HPO) for reinforcement learning (RL) is particularly difficult when compared to other types of machine learning. This is for several reasons, including the relative sample inefficiency of RL and its sensitivity to hyperparameters.
AgileRL is initially focused on improving HPO for RL in order to allow faster development with robust training. Evolutionary algorithms have been shown to allow faster, automatic convergence to optimal hyperparameters than other HPO methods by taking advantage of shared memory between a population of agents acting in identical environments.
At regular intervals, after learning from shared experiences, a population of agents can be evaluated in an environment. Through tournament selection, the best agents are selected to survive until the next generation, and their offspring are mutated to further explore the hyperparameter space. Eventually, the optimal hyperparameters for learning in a given environment can be reached in significantly less steps than are required using other HPO methods.

26