cthorrez t1_jclwi3d wrote on March 17, 2023 at 7:52 PM

Reply to comment by Competitive-Rub-1958 in [D] PyTorch 2.0 Native Flash Attention 32k Context Window by super_deap

Fast inference is also important to anyone who wants to deploy these things.

mike94025 t1_jcmepd6 wrote on March 17, 2023 at 9:55 PM

With Better Transformer, ¿Por que no los dos?

cthorrez t1_jcmhg8m wrote on March 17, 2023 at 10:14 PM

Because the algorithm seems to only work on inference. Probably due to memory management of the cached activations or something. (Idk the actual technical reasons)

mike94025 t1_jcmlddm wrote on March 17, 2023 at 10:42 PM

Better Transformer supports both, today. Some optimizations are still inference-only (and in particular support for variable-sequence length Nested Tensor) and the inference fastpath is a bit silo'ed, but nothing that future PyTorch update could not fix.

[deleted] t1_jcmtgb4 wrote on March 17, 2023 at 11:41 PM

[removed]