askingforhelp1111 t1_j8cmbr6 wrote on February 13, 2023 at 8:40 AM

Reply to comment by machineko in [D] Speed up HuggingFace Inference Pipeline by [deleted]

Much thanks for the reply, would love to read your resources on compression and inference.

I'm keen on cutting down costs. Previously ran on GPU via AWS EC2 instance but gotta tighten the company's belt this year and my manager suggested running on CPU. Love to hear your suggestions too (if any).

askingforhelp1111 t1_j81hia1 wrote on February 10, 2023 at 10:47 PM

Reply to comment by gingerbread42 in [D] Speed up HuggingFace Inference Pipeline by [deleted]

Thanks for the idea!

askingforhelp1111 t1_j81ggm0 wrote on February 10, 2023 at 10:40 PM

Reply to comment by coolmlgirl in [D] Speed up HuggingFace Inference Pipeline by [deleted]

Sure, I have a few links. All of them have an inference speed of 4-9 seconds.

https://huggingface.co/poom-sci/WangchanBERTa-finetuned-sentiment

https://huggingface.co/ayameRushia/bert-base-indonesian-1.5G-sentiment-analysis-smsa

I call each checkpoint like this:

nlp = pipeline('sentiment-analysis',
            model=checkpoint, 
            tokenizer=checkpoint)

Thank you!