- Cost, Effort, and Performance-wise, does it make more sense to instead just pay to use the OpenAI API and use a cheaper GPT-3 model to lessen business costs? My biggest concern is having my entire business reliant on a 3rd-party API, even more so than the costs of using the model.

- How good is it at writing short stories? If there are better open-source alternatives for doing this better or at a similar level but less resource expensive, what are they?

- How resource-expensive is it to use locally? These are my laptop capabilities:16.0 GB of RAM, AMD Ryzen 7 5800H with Radeon Graphics 3.20 GHz.

- How would I approach fine-tuning it? Are there any resources going through the step-by-step process? Currently, in my mind, I just need to shove a large free-to-use data-set like stories and wait like a day but I have no expertise in this area.

- If I want to incorporate it into a website with an API that takes prompts from users, are there any costs that I should account for? Is there a way to minimize these costs? For example, is there a specific API set-up or one-time cost like an expensive laptop to host it locally and take prompts that I could be implementing?

- Are there any concerns I should have when scaling it for users, such as costs and slow response rate? Also, is there a cap in terms of the requests it can handle or is that just limited by what my own machine can handle?

Comments

Tuggummii t1_j3kyf2w wrote on January 9, 2023 at 7:18 AM

#1,314,382

I'm not a professional, but I can answer some of your questions as my personal opinion.

How good is it at writing short stories?

- I don't think GPT-J is dramatically better than the others, especially for text generation. I often see hallucinating, illogical, misconceived text generation. If you want a result like OpenAI's Davinci-003, you may be disappointed despite your fine tuning.

How resource-expensive is it to use locally?

- You need 40GB+ RAM if you're running on CPU. One of my friends has failed on her 32GB RAM and she had to increase her swap memory, then she succeeded with an extremely slow loading time. ( Almost 7~8 minutes ) If you want GPU power, VRAM with float16 need 32GB+ VRAM ( I saw someone using on 24GB ). CPU generates a text from a prompt in 30~45 seconds whereas a GPU generates a text from the same prompt in 3 to 5 seconds.

learningmoreandmore OP t1_j3kz6zf wrote on January 9, 2023 at 7:27 AM

#1,314,406

Replying to Tuggummii (#1,314,382)

So if I was handling something like 2000-10000+ requests per day for my business, locally isn't going to cut it?

_Arsenie_Boca_ t1_j3kzllo wrote on January 9, 2023 at 7:32 AM

#1,314,428

Your laptop will not begin to suffice, not for inference and especially not for fine tuning. You would need something like an A100 GPU in a server that handles requests. And in the end, the results will be much worse than GPT-3. If you dont already have an AI infrastructure, go with an API, it will save you more than a bit of money (unless you are certain you will use it at scale long-term). If you are worried about OpenAI, there are some other companies that serve LMs.

Tuggummii t1_j3kzxy6 wrote on January 9, 2023 at 7:36 AM

#1,314,438

Replying to learningmoreandmore (#1,314,406)

Unfortunately I have not enough knowledge to answer that question.

learningmoreandmore OP t1_j3l12bf wrote on January 9, 2023 at 7:50 AM

#1,314,475

Replying to _Arsenie_Boca_ (#1,314,428)

I appreciate the insight. I didn't know that it would be that expensive!

So you're saying that even if magically somehow OpenAI were to close shop, I could still just jump ship and use another API and I'll probably only need to very slightly modify the code accessing it since they should be able to handle the same prompts?

learningmoreandmore OP t1_j3l17yp wrote on January 9, 2023 at 7:52 AM

#1,314,480

Replying to Tuggummii (#1,314,438)

No problem! Thanks for the insight regarding its capability and costs

_Arsenie_Boca_ t1_j3l1x1y wrote on January 9, 2023 at 8:02 AM

#1,314,509

Replying to learningmoreandmore (#1,314,475)

Pretty much, yes. I believe other APIs might use slightly worse models than OpenAI but definitely better than GPT-J.

LetterRip t1_j3l42en wrote on January 9, 2023 at 8:30 AM

#1,314,603

Replying to learningmoreandmore (#1,314,475)

You can use the GPT-J-6B 8bit, and can do finetuning on a single GPU with 11 GB of VRAM.

https://huggingface.co/hivemind/gpt-j-6B-8bit

You could probably do a fine tune and test fairly cheaply using google colaboratory or colaboratory pro (9.99$/month).

learningmoreandmore OP t1_j3l4cu0 wrote on January 9, 2023 at 8:34 AM

#1,314,617

Replying to LetterRip (#1,314,603)

Thanks! Would this be able to scale and handle more computations or is this only for personal use? I wonder why most people wouldn't be using this version if it's so efficient computation-wise

mr_birrd t1_j3l7rjf wrote on January 9, 2023 at 9:21 AM

#1,314,732

Replying to learningmoreandmore (#1,314,617)

Cause it is worse metric wise.

spiky_sugar t1_j3lbllf wrote on January 9, 2023 at 10:16 AM

#1,314,866

I am not sure what kind of stories you plan to generate, but I would recommend to look at huggingface model repositories. There are many models that have been already finetuned for the specific topic - for example https://huggingface.co/KoboldAI. It can save you lots of money and time.

spiky_sugar t1_j3ley7v wrote on January 9, 2023 at 11:02 AM

#1,314,983

Replying to learningmoreandmore (#1,314,406)

It depends. It really varies depending on what parameters you set for the generation. The choice of decoding and output text length can dramatically change the speed and quality of the outcome.

GPT-J-6B model I would say that it is possible to generate 10000 requests in few hours. Using only CPU will take much longer, but you could maybe generate 2000 requests in 24 hours. But again, it is strongly dependent on input and output text length and decoding type.

Human_Ad8482 t1_j3lpxgx wrote on January 9, 2023 at 1:04 PM

#1,315,487

Use byteandbit to load it with 8 bit and then you can run it on a local machine. Basically I think running the model is not a problem, but am not very optimistic about the quality of the generated story. I am using gpt3 to generating ad copies and to be honest it still takes a lot of efforts to tune the prompts to get some decent results, let along using the open-sourced model.

waffles2go2 t1_j3m80ek wrote on January 9, 2023 at 3:23 PM

#1,316,425

So you want to create a business that allows users to write stories by just "using the API"?

Tell me your background in business....

LetterRip t1_j3meu7o wrote on January 9, 2023 at 4:08 PM

#1,316,727

Replying to learningmoreandmore (#1,314,617)

Same license as the 32 bit version so commercial usage is fine (apache-2.0 - see the page for details) Should give similar results and scaling (according to the link above is 1-10% slower inference).

TeamRocketsSecretary t1_j3p0kyv wrote on January 10, 2023 at 2:04 AM

#1,321,547

I had no problem fine-tuning GPT-3 on a million sequences using an rtx-2060 super with 8GB ram. Just follow the huggingface tutorials…

That said for non-nvidia GPU’s I’m not sure how it would work.

devl82 t1_j3q3ar8 wrote on January 10, 2023 at 7:52 AM

#1,323,276

Hire an engineer or depending on your business plan a team of engineers. It is not something that can be solved in a laptop, especially finetuning and without ML background

Nmanga90 t1_j3xxi44 wrote on January 11, 2023 at 8:48 PM

#1,335,347

Replying to learningmoreandmore (#1,314,475)

OpenAI is not going to close shop any time soon. Not sure if you know this, but Microsoft has been making huge investments into them, and has licensing rights to the GPT models. So Microsoft is pretty much the one who is serving the APIs, and they are right now looking into making another 10-billion-dollar investment into OpenAI.

Nmanga90 t1_j3xxwj6 wrote on January 11, 2023 at 8:51 PM

#1,335,374

Replying to learningmoreandmore (#1,314,480)

Locally will not cut it unless you have a high performance computer with lab grade GPUs for inference. The reason the AI models are so expensive to use is because they are actually pretty expensive to run. They are running probably 2 parallel versions of the model on a single a100, and have likely duplicated this architecture 10,000 times. And an a100 is 10 grand used, 20 grand new. You can also rent them out for about $2 per minute.