nicku_a OP t1_jdlluja wrote on March 25, 2023 at 8:32 AM

Reply to comment by jomobro117 in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

And yes the plan is to offer distributed training! As you can imagine there are about a million things we want/need to add! If you would like to get involved in the project and help out, please do

nicku_a OP t1_jdllrfv wrote on March 25, 2023 at 8:31 AM

Reply to comment by jomobro117 in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

Hey! Yes, there are similarities to PBT, but there are a few differences here. Firstly, the mutations implemented with AgileRL are much more dynamic. Rather than only mutating hyperparameters, we’re allowing any part of the algorithm/model to mutate - HPs, network architecture (layers and nodes), activation functions and network weights themselves. We also train the population in one go, and offer efficient learning by sharing experience within the population.

nicku_a OP t1_jdkdxy8 wrote on March 25, 2023 at 12:52 AM

Reply to comment by LifeScientist123 in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

Good question! So what we’re doing here is not specifically applying evolutionary algorithms instead of RL. We’re applying evolutionary algorithms as a method of HPO, while still using RL to learn and it’s advantages. Take a look at my other comments explaining how this works, and check out the docs for more information.

nicku_a OP t1_jdkd7qf wrote on March 25, 2023 at 12:46 AM

Reply to comment by Modruc in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

We’ve also shown that using these libraries as-they-come is far slower for real problems than what we can offer!

nicku_a OP t1_jdkd2ax wrote on March 25, 2023 at 12:45 AM

Reply to comment by Modruc in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

Libraries like stable baselines/rl zoo are actually quite inflexible and hard to fit to your own problem. We’re introducing (with plans to add way more!) RL algorithms that you can use, edit and tune to your specific needs faster and more flexibly.

nicku_a OP t1_jdi7ks3 wrote on March 24, 2023 at 4:00 PM

Reply to comment by [deleted] in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

So this is a training framework that can be used to train agents in gym environments (and other custom environments that use the gym-style format).

You can select a gym environment from its name, e.g. 'LunarLander-v2' when creating the environment, and then train it. See the docs for more info.

nicku_a OP t1_jdhyy1d wrote on March 24, 2023 at 3:05 PM

Reply to comment by boyetosekuji in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

hahaha I love it

nicku_a OP t1_jdhyqr6 wrote on March 24, 2023 at 3:03 PM

Reply to comment by Peantoo in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

You can help! Please join the discord and get involved, we’d love to have you

nicku_a OP t1_jdhrwgo wrote on March 24, 2023 at 2:18 PM

Reply to comment by paramkumar1992 in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

Thanks, that's the plan!

nicku_a OP t1_jdhra2m wrote on March 24, 2023 at 2:14 PM

Reply to comment by Puzzleheaded_Acadia1 in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

Sure! Traditionally, hyperparameter optimization (HPO) for reinforcement learning (RL) is particularly difficult when compared to other types of machine learning. This is for several reasons, including the relative sample inefficiency of RL and its sensitivity to hyperparameters.
AgileRL is initially focused on improving HPO for RL in order to allow faster development with robust training. Evolutionary algorithms have been shown to allow faster, automatic convergence to optimal hyperparameters than other HPO methods by taking advantage of shared memory between a population of agents acting in identical environments.
At regular intervals, after learning from shared experiences, a population of agents can be evaluated in an environment. Through tournament selection, the best agents are selected to survive until the next generation, and their offspring are mutated to further explore the hyperparameter space. Eventually, the optimal hyperparameters for learning in a given environment can be reached in significantly less steps than are required using other HPO methods.

nicku_a OP t1_jdhr0vk wrote on March 24, 2023 at 2:12 PM

Reply to comment by Riboflavius in [P] Reinforcement learning evolutionary hyperparameter optimization - 10x speed up by nicku_a

Thanks!! Plenty more work to be done. Please share with anyone you think would be interested!