To clarify, I'm not talking about ChatGPT here. I've been testing outputs from GPT-3 davinci003 against alternatives in terms of output quality, relevance, and ability to understand "instruct" (versus vanilla autocompletion).

I tried these: AI21 Jurassic 178B NeoX 20B GPT J 6B FairSeq 13B

As well as: GPT-3 davinci002 GPT-3 davinci001

Of course, I didn't expect the smaller models to be on par with GPT-3, but I was surprised at how much better GPT3 davinci 003 performed compared to AI21's 178B model. AI21's Jurassic 178B seems to be comparable to GPT3 davinci 001.

Does this mean that only well-funded corporations will be able to train general-purpose LLMs? It seems to me that just having a large model doesn't do much, it's also about several iterations of training and feedback. How are open source alternatives going to be able to compete?

(I'm not in the ML or CS field, just an amateur who enjoys using these models)

Comments

You must log in or register to comment.

koolaidman123 t1_j6xd93p wrote on February 2, 2023 at 3:51 PM

None of the models you listed are instruction tuned, so it's no surprise that gpt3 performs better

Some better models are gpt-jt and flan t5 11b, probably the best in terms of open source models right now, maybe opt-iml?

Franck_Dernoncourt t1_j6ydkiu wrote on February 2, 2023 at 7:37 PM

> I was surprised at how much better GPT3 davinci 003 performed compared to AI21's 178B model. AI21's Jurassic 178B seems to be comparable to GPT3 davinci 001.

on which tasks?

> Of course, I didn't expect the smaller models to be on par with GPT-3

You could read Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto. Benchmarking Large Language Models for News Summarization. arXiv:2301.13848.:

> we find instruction tuning, and not model size, is the key to the LLM’s zero-shot summarization capability

Single_Blueberry t1_j6x9v3s wrote on February 2, 2023 at 3:29 PM

>Does this mean that only well-funded corporations will be able to train general-purpose LLM

No, they are just always a couple years ahead.

That's not just a thing with language models, or even ML, it's like that with many technologies.

sgramstrup t1_j6y7any wrote on February 2, 2023 at 6:58 PM

In an exponential future, models from big corp will feel like they are light-years beyond open models. Their resources will always be larger, and they will keep accelerating faster on the exponential curve.

Acceptable-Cress-374 t1_j6yil6g wrote on February 2, 2023 at 8:09 PM

> Their resources will always be larger, and they will keep accelerating faster on the exponential curve.

Sure, they'll have more money to throw at a problem, but also more incentive to throw that money into other money-making stuff. Open-source models might not necessarily go the same path, and even if under-trained or less-optimized, they might still be a tremendous help once a community gets to play with them.

visarga t1_j6x8zna wrote on February 2, 2023 at 3:23 PM

I think open source implementations will eventually get there. They probably need much more multi-task and RLHF data, or they had too little code in the initial pre-training. Training GPT-3.5 like models is like a recipe, and the formula + ingredients are gradually becoming available.

Hyper1on t1_j6xn5do wrote on February 2, 2023 at 4:54 PM

> AI21's Jurassic 178B seems to be comparable to GPT3 davinci 001.

This is actually a compliment to AI21, since davinci001 is fine-tuned from original 175B davinci on human feedback over generations:

https://platform.openai.com/docs/model-index-for-researchers

The better comparison is with plain davinci, and you would expect 001 to be better and 003 to be significantly better (the latter is trained with RLHF).

There are currently no open source RLHF models to compete with davinci 003, but this will change in 2023.

saturn_since_day1 t1_j7hcyfl wrote on February 6, 2023 at 8:24 PM

They would have to offer free service paid by ads and/or selling the resulting training data to the big corps