Submitted by [deleted] t3_10z3qdt in MachineLearning
coolmlgirl t1_j819nx4 wrote
Can you share the link to that Hugging Face model so I can see how I may help?
askingforhelp1111 t1_j81ggm0 wrote
Sure, I have a few links. All of them have an inference speed of 4-9 seconds.
https://huggingface.co/poom-sci/WangchanBERTa-finetuned-sentiment
https://huggingface.co/ayameRushia/bert-base-indonesian-1.5G-sentiment-analysis-smsa
I call each checkpoint like this:
nlp = pipeline('sentiment-analysis',
model=checkpoint,
tokenizer=checkpoint)
Thank you!
coolmlgirl t1_j8fmfpi wrote
I'm using the OctoML platform (https://octoml.ai/) to optimize your model and I got your average inference latency down to 2.14ms on an AWS T4 GPU. On an Ice Lake CPU I can get your latency down to 27.47ms. I'm assuming shapes of [1,128] for your inputs "input_ids," "attention_mask," and "token_type_ids," but want to confirm your actual shapes so that we're comparing apples to apples. Do you know what shapes you're using?
coolmlgirl t1_j8fml7y wrote
My results above are for this model: https://huggingface.co/ayameRushia/bert-base-indonesian-1.5G-sentiment-analysis-smsa
​
It's pretty easy to use that platform to automatically do the same for your other model too-- we can discuss that one also later once we figure out this one.
Viewing a single comment thread. View all comments