Viewing a single comment thread. View all comments

_Arsenie_Boca_ t1_itwt4ls wrote

I dont think any model you can run on a single commodity gpu will be on par with gpt-3. Perhaps GPT-J, Opt-{6.7B / 13B} and GPT-Neox20B are the best alternatives. Some might need significant engineering (e.g. deepspeed) to work on limited vram

4

deeceeo t1_itxiswn wrote

UL2 is 20b and supposedly on par with GPT-3?

2

_Arsenie_Boca_ t1_ityby3b wrote

True, I forgot about this one. Although getting to run a 20b model (Neox20b and UL20B) on an rtx gpu is probably a big stretch

1

AuspiciousApple OP t1_itwvq1f wrote

>I dont think any model you can run on a single commodity gpu will be on par with gpt-3.

That makes sense. I'm not an NLP person, so I don't have a good intuition on how these models scale or what the benchmark numbers actually mean.

In CV, the difference between a small and large model might be a few % accuracy on imagenet but even small models work reasonably well. FLAN T5-XL seems to generate nonsense 90% of the time for the prompts that I've tried, whereas GPT3 has great output most of the time.

Do you have any experience with these open models?

1

_Arsenie_Boca_ t1_ityccjh wrote

I dont think there is a fundamental difference between cv and nlp. However, we expect language models to be much more generalist than any vision model (Have you ever seen a vision model that performs well on discriminative and generative tasks across domains without finetuning?) I believe this is where scale is the enabling factor.

1