Submitted by MBle t3_11v1eu7 in MachineLearning

Hi,

Is there any way to run llama (or any other) model in such a way, that you only pay per API request?

I wanted to test how the llama model would do in my specific usecase, but when I went to HF Interface Endpoints it says that I would have to pay over 3k USD per month (ofc I do not have that much money to spend on a side-project).

I would like to test this model by paying on per request basis.

0

Comments

You must log in or register to comment.

currentscurrents t1_jcqzjil wrote

I haven't heard of anybody running LLama as a paid API service. I think doing so might violate the license terms against commercial use.

>(or any other) model

OpenAI has a ChatGPT API that costs pennies per request. Anthropic also recently announced one for their Claude language model but I have not tried it.

5

veonua t1_jct419t wrote

Creating a monopoly on AI can be extremely risky. Although OpenAI was founded to prevent it, recent actions by the company suggest that they may be contributing to monopolization by reducing prices.

4

danielbln t1_jctk7sy wrote

Pennies per request would be a lot, it's a fraction of a penny per request.

1

Philpax t1_jcrgxbb wrote

As the other commenter said, it's unlikely anyone will advertise a service like this as LLaMA's license terms don't allow for it. In your situation, I'd just rent a cloud GPU server (Lambda Labs etc) and test the models you care about. It'll only end up being a dollar or two if you're quick with your use.

2

NotARedditUser3 t1_jcsc9lp wrote

You can get llama running on consumer grade hardware. There's 4 and 8 bit quantization for it i believe where it fits in a normal gpu's vram, i saw floating around here

1

veonua t1_jct3plc wrote

As far as I know, the Meta license forbids this, since the model is for academic purposes only

1