Dependent_Ad5120 t1_jd1d00j wrote on March 21, 2023 at 2:53 AM

Reply to comment by mike94025 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

It seems to me that I have to call model.eval() to use the memory_efficient attention. Otherwise, it throws an error of no available kernel.

I tried on both rtx 3090 and A100, in both cases, it seems only have enable_flash=True resulted in the same error of no available kernel, even with model.eval().

So my questions are:

with model.eval(), does it mean drop_out is not enabled during training?
Am I doing something wrong for flash attention? How do I actually enable it?

Thanks a lot!

Dependent_Ad5120 t1_jd3knio wrote on March 21, 2023 at 4:17 PM

OK, I found out why. To use flash attention, I had to use fp16. It is a bit faster then using memory_efficient attention in my test.

mike94025 t1_je5o126 wrote on March 29, 2023 at 4:20 PM

https://www.linkedin.com/posts/michael-gschwind-3704222_pytorch-activity-7046773418288955393-gOSh