Viewing a single comment thread. View all comments

royalemate357 t1_jckqgsr wrote

Pretty sure the main improvement is "torch.compile" which can optimize your code in a nice easy one liner. There's some other nice quality of life improvements like the built in flash attention OP is using, and I think some distributed training stuff. But it's fully backwards compatible, which is great (looking at you tensorflow) https://pytorch.org/get-started/pytorch-2.0/#pytorch-2x-faster-more-pythonic-and-as-dynamic-as-ever

43

MoistYogurtcloset400 t1_jclv6r1 wrote

Is this torch.compile only compatible with cuda device only?

5

royalemate357 t1_jclz4t0 wrote

hmm, I am not too sure but their blogpost says this:

>TorchInductor uses a pythonic define-by-run loop level IR to automatically map PyTorch models into generated Triton code on GPUs and C++/OpenMP on CPUs.

so it seems like they support CPU. I also tried it briefly on google colab CPU-only, and it seems to work (i didn't benchmark speed though). I doubt it supports non cuda GPUs but then again support for those even in the general case isnt very good.

8

mike94025 t1_jcn7ksu wrote

Works for all. You need a compiler backend that can code-gen for your target, and need a frontend for the optimizer that can process the IR.

Alternatively, you need a backend for Triton (or another already supported optimizer) that can codegen for your target architecture.

4

programmerChilli t1_jcny4qx wrote

We currently officially support Cuda and CPU, although in principle it could be used for other backends too.

3