Submitted by noellarkin t3_10rhprm in MachineLearning
To clarify, I'm not talking about ChatGPT here. I've been testing outputs from GPT-3 davinci003 against alternatives in terms of output quality, relevance, and ability to understand "instruct" (versus vanilla autocompletion).
I tried these: AI21 Jurassic 178B NeoX 20B GPT J 6B FairSeq 13B
As well as: GPT-3 davinci002 GPT-3 davinci001
Of course, I didn't expect the smaller models to be on par with GPT-3, but I was surprised at how much better GPT3 davinci 003 performed compared to AI21's 178B model. AI21's Jurassic 178B seems to be comparable to GPT3 davinci 001.
Does this mean that only well-funded corporations will be able to train general-purpose LLMs? It seems to me that just having a large model doesn't do much, it's also about several iterations of training and feedback. How are open source alternatives going to be able to compete?
(I'm not in the ML or CS field, just an amateur who enjoys using these models)
koolaidman123 t1_j6xd93p wrote
None of the models you listed are instruction tuned, so it's no surprise that gpt3 performs better
Some better models are gpt-jt and flan t5 11b, probably the best in terms of open source models right now, maybe opt-iml?