Submitted by NaturalGradient t3_10lot3v in MachineLearning

Find the release notes here:

https://github.com/nnaisense/evotorch/releases/tag/v0.4.0

A big highlight is how fast these implementations are! I genuinely believe GPU-acceleration is the future of Evolutionary algorithms, and EvoTorch and its integration into the PyTorch ecosystem is a fantastic enabler for this.

To demonstrate the raw speed provided by the new release, I compared EvoTorch's CMA-ES implementation to that provided by the popular pycma package on the 80-dimensional Rastrigin problem and tracked the run-time:

Performance was measured over 50 runs on the 80-dimensional Rastrigin problem

The crazy thing to note is that when we switch to GPU (Tesla V100), we can efficiently run CMA-ES with population sizes going into 100k+!

153

Comments

You must log in or register to comment.

ML4Bratwurst t1_j5z42ky wrote

Call me picky, but I would not use a ML library that is not GPU accelerated. This should be default

−12

lucidraisin t1_j5z7z6g wrote

CMA-ES! definitely playing around with this, thank you!

6

fernandocamargoti t1_j5zs45e wrote

They not about learning from data, they are about optimization. They are from the broader AI field of study, but I wouldn't say they are ML. They serve a different purpose. Even though there are some research about using them to optimize models (instead of using gradient descent), but it's not their main use case.

−7

ReginaldIII t1_j5zv9gz wrote

Thats such a tenuous distinction and you're wrong anyway because you can pose any learning from data problem as a generic optimization problem.

They're very useful when your loss function is not differentiable but you still want to fit a model to input+output data pairs.

They're also useful when your model parameters have domain specific meaning and you can derive rules for how two parameter sets can be meaningfully combined with one another

Decision trees and random forests are ML too. What you probably mean is Deep Learning. But even that has a fuzzy boundary to surrounding methods.

Being a prescriptionist with these definitions is a waste of time because the research community as a whole cannot draw clear lines in the sand.

10

ReginaldIII t1_j5zvqal wrote

Okay, you're picky :p

Try deploying a model for realtime online learning of streaming sensor data that needs to runs on battery power and then insist it needs to run on GPUs.

Plenty of legitimate use cases for non GPU ML.

7

ReginaldIII t1_j5zzhj1 wrote

Pick the tools that work for the problems you have. If you are online training a model on an embedded device you need something optimized for that hardware.

I gave you a generic example of a problem domain where this applies. You can search for online training on embedded devices if you are interested but I can't talk about specific applications because they are not public.

All I'm saying is drawing a line in the sand and saying you'd never use X if it doesn't have Y is silly because what if you end up working on something in the future where the constraints are different?

5

Ulfgardleo t1_j603u8t wrote

in my experience, this is never the bottleneck. rastrigin does not cost much to evaluate, real functions where you would consider evolution on, do. I did research in speeding up CMA-ES and in the end it felt like a useless exercise in matrix algebra for that reason.

Yes, in theory being able to speed-up matrix operations is nice, but doing stuff in higher dimensions (80 is kinda irrelevant computationally, even on a CPU) always has to fight against the O(1/n) convergence rate of all evo algorithms.

So all this is likely good for is benchmarking these algorithms in a regime that is practically irrelevant for evolution.

5

NaturalGradient OP t1_j60iyek wrote

It depends what you're trying to do :)

If you want to run GPU-accelerated neuroevolution in Brax or IsaacGym, then keeping everything on GPU is absolutely relevant. Similarly if you're trying to do MPC or any optimisation of an NN input, then its still very useful to be on the GPU. As you said, bench-marking is another place this GPU acceleration can be very helpful. Basically anywhere where the fitness evaluation isn't the only bounding factor.

For expensive/CPU-bounded fitness functions, we have other utilities too! For example, with a single flag you can distribute your fitness evaluation across multiple actors using ray. This means you can scale to an entire CPU cluster effortlessly!

7

NaturalGradient OP t1_j60jc3z wrote

Great to hear! I actually lead the CMA-ES effort and tried very hard to match the fine details of pycma so that the performance is comparable. If you run into any unexpected behavior please do open a Github issue or reach out to me directly. There's a lot of fine details in practical CMA-ES implementation, so I'd really like to know if I missed anything.

7

programmerChilli t1_j60s9pz wrote

Have you tried out PyTorch 2.0 compilation feature (i.e. torch.compile)? Might help a lot for evolutionary computation stuff.

2

fernandocamargoti t1_j60xagg wrote

Well, what you talking about is some ways to use evolutionary algorithms to optimize the parameters of a ML model. But in my eyes, it doesn't mean it is ML. They both share a lot, but they aren't the same. For me, evolutionary algorithms is part of Meta Heuristics, which is part of AI (which ML is also part of). Different areas and sub areas of research do interact with each other. I just mean that the is part is a bit too much in this case.

−1

pythonpeasant t1_j614dq6 wrote

THIS IS HUGE!!!!

Please go back to the AttentionNeuron and AttentionAgent papers and retrain them on GPU with big population sizes!

3

ReginaldIII t1_j61nlno wrote

Trying to force these things into a pure hierarchy sounds nothing short of an exercise in pedantry.

And to what end? You make up your own distinctions that no one else agrees with and you lose your ability to communicate ideas to people because you're talking a different language to them.

If you are so caught up on the "is a" part. Have you studied any programming languages that support "multiple inheritance" ?

2

Mefaso t1_j61zim5 wrote

>If you want to run GPU-accelerated neuroevolution in Brax or IsaacGym, then keeping everything on GPU is absolutely relevant

Do you have evidence for that?

I would assume that running brax rollouts for example would take 100x as long as the actual cmaes

2

danielgafni t1_j62mh4o wrote

How does it compare to evojax? A huge deal there is training all the networks in the population in parallel. This gives absolutely massive speedups as you can imagine. Can evotorch do it?

1