Submitted by besabestin t3_10lp3g4 in MachineLearning
manubfr t1_j5y6wko wrote
Google (and DeepMind) actually have better LLM tech and models than OpenAI (if you believe their published research anyway). They had a significant breathrough last year in terms of scalability: https://arxiv.org/abs/2203.15556
Existing LLMs are found out to be undertrained and with some tweaks you can create a smaller model that outperforms larger ones. Chinchilla is arguably the most performant model we've heard of to date ( https://www.jasonwei.net/blog/emergence ) but it hasn't been pushed to any consumer-facing application AFAIK.
This should be powering their ChatGPT competitor Sparrow which might be reeleased this year. I am pretty sure that OpenAI will also implement those ideas for GPT-4.
Dendriform1491 t1_j5ywgiz wrote
Also, Google doesn't use GPUs, they designed their own cards which they call TPUs.
TPUs are ASICs designed specifically for machine learning, they don't have any graphics related components, they are cheaper to make, use less energy and can make as many as they want.
cdsmith t1_j5z0rrm wrote
You don't have to be Google to use special-purpose hardware for machine learning, either. I work for a company (Groq) that makes a machine learning acceleration chip available to anyone. Groq has competitors, like SambaNova and Cerebras, with different architectures.
Taenk t1_j60gdbl wrote
Do these also increase inference speed? How much work is it to switch from CUDA based software to one of these?
cdsmith t1_j60q0bs wrote
I can only answer about Groq. I'm not trying to sell you Groq hardware, honestly... I just honestly don't know the answers for other accelerator chips.
Groq very likely increases inference speed and power efficiency over GPUs; that's actually its main purpose. How much depends on the model, though. I'm not in marketing so I probably don't have the best resources here, but there are some general performance numbers (unfortunately no comparisons) in this article, and this one talks about a very specific case where a Groq chip gets you a 1000x inference performance advantage over the A100.
To run a model on a Groq chip, you would typically start before CUDA enters the picture at all, and convert from PyTorch, Tensorflow, or a model in several other common formats into a Groq program using https://github.com/groq/groqflow. If you have custom-written CUDA code, then it's likely you've got some programming work ahead of you to run on something besides a GPU.
lucidrage t1_j61so7l wrote
>convert from PyTorch, Tensorflow, or a model in several other common formats into a Groq program
Are there any effort spend in adding a plugin for a high level framework like keras to automatically use groq?
cdsmith t1_j62a3yv wrote
I'm not aware of any effort to build it into Keras, but Keras models are one of the things you can easily convert to Groq chips using groqflow.
gradientpenalty t1_j61tko2 wrote
Okay, so where can I buy it as a small startup for under 10k without signing any NDA for using your proprietary compiler. As far as I can see, we are all still stuck with Nvidia after 10B of funding for all these "AI" hardware startup.
cdsmith t1_j626c0c wrote
I honestly don't know the price or terms of use, for this or any other company. I'm not in sales or marketing at all. I said you don't need to be Google; obviously you have to have some amount of money, whether you're buying a GPU or some other piece of hardware.
CKtalon t1_j5y87e5 wrote
People often quote Chinchilla about performance, claiming that there's still a lot of performance to be unlocked when we do not know how GPT 3.5 was trained. GPT 3.5 could very well be Chinchilla-optimal, even though the 1st version of davinci was not Chinchilla-optimal. We know that OpenAI has retrained GPT 3 due to the increased context length going from 2048 to 4096 to the apparent 8000ish tokens for ChatGPT.
manubfr t1_j5y8mo0 wrote
You're right, it could be that 3.5 is already using that approach. I guess the emergent cognition tests haven't yet been published for GPT-3.5 (or have they?) so it's hard for us to measure performance as individuals. I guess someone could test text-davinci-003 on a bunch of cognitive tasks on the PlayGround but I'm far too lazy to do that :)
CKtalon t1_j5y9deu wrote
There's also the rumor mill that Whisper was used to gather a bigger text corpus from videos to train GPT 4.
FallUpJV t1_j5ya6t5 wrote
This is something that I often read, that other LLMs are undertrained, but how come the OpenAI one is the only one not to be ? Datasets ? Computing power ?
MysteryInc152 t1_j60vz8p wrote
OpenAI's models are still undertrained as well.
Viewing a single comment thread. View all comments