I've recently discovered models such as ChatLLaMA that allows you to create a "ChatGPT" but you need Meta's LLaMA weights (yes, you can find them in torrents but that's not the point of the question). Similar limitations found in other cases.

Therefore I wanted to try to find an open source: dataset (in addition to hugging face), "base model", "chat model" AND that it is feasible to train with a commercial computer with a very good GPU (NVIDIA, etc.). With this get at least decent results.

Also would be interesting to distinguish between solutions with commercial limitations and those who don't.

Thanks!

• EDIT • A first solution I already found is this: https://github.com/databrickslabs/dolly based on this https://huggingface.co/EleutherAI/gpt-j-6B, but looking for some discussion and perhaps other/better solutions.

Comments

Hands0L0 t1_jdsvv9h wrote on March 26, 2023 at 10:49 PM

You and everyone else here

W_O_H t1_jdswox3 wrote on March 26, 2023 at 10:55 PM

Ye it will be easy if you have a budget of 200+ mil.

[deleted] t1_je4of2r wrote on March 29, 2023 at 12:04 PM

[removed]

BeautifulLazy5257 t1_jdsyfbc wrote on March 26, 2023 at 11:08 PM

I'd start by going through the Hugging Facd courseware.

You'll learn in the first chapter of their courses that it is just better for people to fine-tune pre-trained models. That's what they are there for.

It costs a lot and produces a lot of energy and heat waste to train a LLM from scratch.

big_ol_tender t1_jdsvods wrote on March 26, 2023 at 10:47 PM

Lol

m0ushinderu t1_jdtbee7 wrote on March 27, 2023 at 12:50 AM

GPT's model architecture has not exactly been shrouded in mystery anyways. It is all about the training data and training methodology. Thats why projects that crowd sources training data such as OpenAssistant is so important rn. You can check them out!

LastVariation t1_jdsx10i wrote on March 26, 2023 at 10:58 PM

Start by training a gpt-2, then add 2, then reinforce it to not be crazy by feeding chatGPT responses

he_who_floats_amogus t1_jdtwp8t wrote on March 27, 2023 at 3:54 AM

>open source dataset ... feasible to train with a commercial computer ... decent results

Choose two. Therefore, you can approach this one of three ways:

Use closed source data (eg. where your starting point is a pre-trained model and you're doing additional fine-tuning training)
Use millions of dollars of compute resource (a "very good GPU - nvidia etc" does not meet this standard)
Accept poor results

Deep-Station-1746 t1_jduhbth wrote on March 27, 2023 at 8:13 AM

OP is peaking on Dunning-Kruger curve right now.

[deleted] t1_jduqdc3 wrote on March 27, 2023 at 10:24 AM

[deleted]

Deep-Station-1746 t1_jduxvm4 wrote on March 27, 2023 at 11:51 AM

You know what? provide me with a 2-3 sample "good" responses to the above post, and explain why they make for a better response than what I wrote, and I'll actually use them from now on to respond to low-effort posts from this sub.

[deleted] t1_jdwfq0x wrote on March 27, 2023 at 6:15 PM

[deleted]

[deleted] t1_jdvdnbg wrote on March 27, 2023 at 2:05 PM

[deleted]

suflaj t1_jdvbyog wrote on March 27, 2023 at 1:53 PM

With the constraints you have I'm afraid the best you could do is:

find a person who can quickly copy and paste prompts
give them internet access
pay for ChatGPT Plus
have them copy user prompts into ChatGPT and copy its answer to the user

[deleted] t1_jedmcfc wrote on March 31, 2023 at 6:41 AM

[removed]