BestSentence4868 t1_j1cackq wrote on December 23, 2022 at 6:05 AM

Reply to comment by _underlines_ in [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_

is DeepSpeed-MII publically available or only on azure?

BestSentence4868 t1_j1bjak3 wrote on December 23, 2022 at 2:08 AM

Reply to [D] When chatGPT stops being free: Run SOTA LLM in cloud by _underlines_

run int8 instead of fp16 gg rip

BestSentence4868 t1_iu8h0kj wrote on October 29, 2022 at 11:19 AM

Reply to comment by big_dog_2k in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Feel free to dm for any further questions

BestSentence4868 t1_iu8gi68 wrote on October 29, 2022 at 11:13 AM

Reply to comment by big_dog_2k in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Yep! Fire up Triton(I'd used their docker container), install pytorch via pip or just put it in the dockerfile and you're off to the races! I actually did just deploy Triton+pytorch+flask for a web app this week :)

BestSentence4868 t1_iu8fluq wrote on October 29, 2022 at 11:01 AM

Reply to [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Love love love Triton inference server for the framework flexibility. Super mature with so much stuff I never thought I'd need like model warmup etc. ORT and TensorRT are cool, but if all else fails Python backend is awesome.