I also haven't used spaCy in a while, but I am pretty sure there is not a way to make this work with -sm, -md or -lg models, but what Michaƫl says should be true for -trf models, but I don't think it will be easy. Already spacy-transformers has to wrap HF models so they have a thinc API, you would have to dig deep in there to call Kernl's optimize_model
reSAMpled t1_iuj2yrm wrote
Reply to comment by pommedeterresautee in [P] Up to 12X faster GPU inference on Bert, T5 and other transformers with OpenAI Triton kernels by pommedeterresautee
I also haven't used spaCy in a while, but I am pretty sure there is not a way to make this work with
-sm
,-md
or-lg
models, but what Michaƫl says should be true for-trf
models, but I don't think it will be easy. Already spacy-transformers has to wrap HF models so they have a thinc API, you would have to dig deep in there to call Kernl'soptimize_model