Taenk t1_j60gdbl wrote on January 26, 2023 at 8:59 PM

Reply to comment by cdsmith in Few questions about scalability of chatGPT [D] by besabestin

Do these also increase inference speed? How much work is it to switch from CUDA based software to one of these?

cdsmith t1_j60q0bs wrote on January 26, 2023 at 9:59 PM

I can only answer about Groq. I'm not trying to sell you Groq hardware, honestly... I just honestly don't know the answers for other accelerator chips.

Groq very likely increases inference speed and power efficiency over GPUs; that's actually its main purpose. How much depends on the model, though. I'm not in marketing so I probably don't have the best resources here, but there are some general performance numbers (unfortunately no comparisons) in this article, and this one talks about a very specific case where a Groq chip gets you a 1000x inference performance advantage over the A100.

To run a model on a Groq chip, you would typically start before CUDA enters the picture at all, and convert from PyTorch, Tensorflow, or a model in several other common formats into a Groq program using https://github.com/groq/groqflow. If you have custom-written CUDA code, then it's likely you've got some programming work ahead of you to run on something besides a GPU.

lucidrage t1_j61so7l wrote on January 27, 2023 at 2:33 AM

>convert from PyTorch, Tensorflow, or a model in several other common formats into a Groq program

Are there any effort spend in adding a plugin for a high level framework like keras to automatically use groq?

cdsmith t1_j62a3yv wrote on January 27, 2023 at 4:55 AM

I'm not aware of any effort to build it into Keras, but Keras models are one of the things you can easily convert to Groq chips using groqflow.