Dependent_Ad5120

Dependent_Ad5120 t1_jd1d00j wrote

It seems to me that I have to call model.eval() to use the memory_efficient attention. Otherwise, it throws an error of no available kernel.

I tried on both rtx 3090 and A100, in both cases, it seems only have enable_flash=True resulted in the same error of no available kernel, even with model.eval().

So my questions are:

  1. with model.eval(), does it mean drop_out is not enabled during training?
  2. Am I doing something wrong for flash attention? How do I actually enable it?

Thanks a lot!

1