Viewing a single comment thread. View all comments

cthorrez t1_jclwi3d wrote

Fast inference is also important to anyone who wants to deploy these things.

6

mike94025 t1_jcmepd6 wrote

With Better Transformer, ¿Por que no los dos?

1

cthorrez t1_jcmhg8m wrote

Because the algorithm seems to only work on inference. Probably due to memory management of the cached activations or something. (Idk the actual technical reasons)

1

mike94025 t1_jcmlddm wrote

Better Transformer supports both, today. Some optimizations are still inference-only (and in particular support for variable-sequence length Nested Tensor) and the inference fastpath is a bit silo'ed, but nothing that future PyTorch update could not fix.

1