Submitted by alkibijad t3_10raouh in MachineLearning

I'm using Huggingface's transformers regularly for experimentations, but I plan to deploy some of the models to iOS.

I have found ml-ane-transformers repo from Apple, which shows how transformers can be rewritten to have much better performance on Apple's devices. There's an example of DistilBERT implemented in that optimized way.

As I plan to deploy transformers to iOS, I started thinking about this. I'm hoping some already have experience about this, so we can discuss:

  • Has anyone tried this themselves? Do they actually see the improvements in performance on iOS?
  • I'm using Huggingface's transformer models in my experiments. How much work do you think there is to rewrite model in this optimized way?
  • It's very difficult to train transformers from scratch (especially if they're big :) ), so I'm fine-tuning on top of pre-trained models on Huggingface. Is it possible to use weights from pretrained Huggingface models with the Apple's reference code? How difficult is it?
24

Comments

You must log in or register to comment.

TheDeviousPanda t1_j6vv0my wrote

I hate to do this to you, but I have been in your position and I have answers to all your questions.

  • Yes, yes
  • A lot
  • Yes, very
11

alkibijad OP t1_j6w7lo3 wrote

That was not the answer I was hoping for, but very helpful :)
Do you have any code/repo to share? I'm only able to find the DistilBERT implementation in apple's repo, would like to see some other examples?

3

vade t1_j6xeylg wrote

FWIW, a colleague of mine is working on this, and is also hitting some hiccups. Ive pointed them to this thread :)

2

Competitive-Rub-1958 t1_j6z8a7t wrote

For someone who simply wants to use ANE (haven't bought it, just considering) for testing out bare-bones models locally (I find remotely debugging quite frustrating) for research purposes before finally training them on cloud, how good is the support with Containerization solutions like Singularity - does it even leverage ANE?

I know the speedup won't really be anything drastic, but if it helps (is faster and more resource efficient than the CPU/GPU) then that just translates to a lower time-to-iterate anyways...

So for someone using plain PyTorch (w/ a bells and whistles), how much of a pain would it be?

1