Submitted by [deleted] t3_10z3qdt in MachineLearning
coolmlgirl t1_j8fmfpi wrote
Reply to comment by askingforhelp1111 in [D] Speed up HuggingFace Inference Pipeline by [deleted]
I'm using the OctoML platform (https://octoml.ai/) to optimize your model and I got your average inference latency down to 2.14ms on an AWS T4 GPU. On an Ice Lake CPU I can get your latency down to 27.47ms. I'm assuming shapes of [1,128] for your inputs "input_ids," "attention_mask," and "token_type_ids," but want to confirm your actual shapes so that we're comparing apples to apples. Do you know what shapes you're using?
coolmlgirl t1_j8fml7y wrote
My results above are for this model: https://huggingface.co/ayameRushia/bert-base-indonesian-1.5G-sentiment-analysis-smsa
​
It's pretty easy to use that platform to automatically do the same for your other model too-- we can discuss that one also later once we figure out this one.
Viewing a single comment thread. View all comments