I'm looking to run some of the bigger models (LLaMA 30B, 65B, namely) on a cloud instance so that I can have some useable performance for completions. I was thinking of an EC2 instance with a single A100 attached, but is this the best setup, or does anyone have any other suggestions?

Comments

You must log in or register to comment.

I_will_delete_myself t1_jb33bmz wrote on March 6, 2023 at 1:41 AM

Use a spot instance. If you testing it out you wallet will thank you later. Look at my previous post on here about running stuff in the cloud before you do it.

Mrkvitko t1_jb6gf6c wrote on March 6, 2023 at 8:09 PM

I just got instance at 8X RTX A5000 for a couple of bucks per hour. on https://vast.ai

I must say LLaMA 65B is a bit underwhelming...

maizeq t1_jb90rkr wrote on March 7, 2023 at 9:43 AM

Underwhelming how?

iloveintuition t1_jb4bp1o wrote on March 6, 2023 at 9:28 AM

Using vast.ai for running flan-xl, works pretty well. Haven't tested on LLama scale.

shayanrc t1_jb4vbcz wrote on March 6, 2023 at 1:28 PM

What config did you use?

isaeef t1_jb4iz4f wrote on March 6, 2023 at 11:15 AM

or you could use any gpu workload specific provider https://www.paperspace.com/

trnka t1_jb5f89k wrote on March 6, 2023 at 3:58 PM

Related, there's a talk on Thursday about running LLMs in production. I think the hosts have deployed LLMs in prod so they should have good advice

[deleted] t1_jb3t91m wrote on March 6, 2023 at 5:27 AM

[removed]

ggf31416 t1_jb4j0uk wrote on March 6, 2023 at 11:15 AM

Good luck getting a EC2 with a single A100, last time I checked, AWS only offered instances with 8 of them at a high price.

l0g1cs t1_jb4tbu7 wrote on March 6, 2023 at 1:10 PM

Check out Banana. They seem to do exactly that with "serverless" A100.

frankod281 t1_jb5j6si wrote on March 6, 2023 at 4:24 PM

Maybe check datacrunch.io they have a good offering for cloud GPU.

Quick-Hovercraft-997 t1_jbx9gcj wrote on March 12, 2023 at 12:52 PM

if latency is not a critical requirement, you can try serverless GPU cloud like banana.dev, pipeline.ai . These platform provide an easy to use template for deploying LLM.

itsnotmeyou t1_jb2z8z7 wrote on March 6, 2023 at 1:08 AM

Are you using these as in a system? For just experimenting around, ec2 is good option. But you would either need to install right drivers or use latest deep learning ami. Another option could be using a custom docker setup on sagemaker. I like that setup for inference as it’s super easy to deploy and separates model from inference code. Though it’s costlier and would be available through sagemaker runtime.

Third would be whole over engineering via setting up your own cluster service.

In general if you want to deploy multiple llm quickly go for sagemaker

itsnotmeyou t1_jb2zfbq wrote on March 6, 2023 at 1:09 AM

On a side note sagemaker was not supporting shm-size so might not work for large lm

pyonsu2 t1_jb3y5ps wrote on March 6, 2023 at 6:23 AM

maybe, Colab Pro+?