askingforhelp1111 t1_j8cmbr6 wrote on February 13, 2023 at 8:40 AM

Reply to comment by machineko in [D] Speed up HuggingFace Inference Pipeline by [deleted]

Much thanks for the reply, would love to read your resources on compression and inference.

I'm keen on cutting down costs. Previously ran on GPU via AWS EC2 instance but gotta tighten the company's belt this year and my manager suggested running on CPU. Love to hear your suggestions too (if any).

machineko t1_j8yo6fd wrote on February 17, 2023 at 10:05 PM

Depends on what models you are using but for most transformers, running on GPUs may be much more efficient than CPUs when you consider $ / M inferences (or inf/$).

Are there specific EC2 instances you have to use or can you deploy on any EC2 instance?