royalemate357 t1_jclz4t0 wrote on March 17, 2023 at 8:09 PM

Reply to comment by MoistYogurtcloset400 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

hmm, I am not too sure but their blogpost says this:

>TorchInductor uses a pythonic define-by-run loop level IR to automatically map PyTorch models into generated Triton code on GPUs and C++/OpenMP on CPUs.

so it seems like they support CPU. I also tried it briefly on google colab CPU-only, and it seems to work (i didn't benchmark speed though). I doubt it supports non cuda GPUs but then again support for those even in the general case isnt very good.

mike94025 t1_jcn7ksu wrote on March 18, 2023 at 1:26 AM

Works for all. You need a compiler backend that can code-gen for your target, and need a frontend for the optimizer that can process the IR.

Alternatively, you need a backend for Triton (or another already supported optimizer) that can codegen for your target architecture.

royalemate357 t1_jcnjaeo wrote on March 18, 2023 at 3:04 AM

oh cool, thanks for the clarification. Nice that you folk made it more backend independent. Would be interesting to try it out on amd/mps devices, i wonder if those requirements are met on those devices though.

mike94025 t1_jcv7ltl wrote on March 19, 2023 at 8:30 PM

You might look into https://github.com/pytorch/pytorch/pull/95793.

programmerChilli t1_jcny4qx wrote on March 18, 2023 at 5:33 AM

We currently officially support Cuda and CPU, although in principle it could be used for other backends too.