Viewing a single comment thread. View all comments

light24bulbs t1_jc0s4wr wrote

−8

Kinexity t1_jc1lwah wrote

That is fast. We are literally talking about a high end laptop CPU from 5 years ago running a 30B LLM.

17

light24bulbs t1_jc2s2oc wrote

Oh, definitely, it's an amazing optimization.

But less than a token a second is going to be too slow for a lot of real time applications like human chat.

Still, very cool though

2

Lajamerr_Mittesdine t1_jc5b99n wrote

I imagine 1 token per 0.2 seconds would be fast enough. That'd be equivalent to a 60 WPM typer.

Someone should benchmark it on an AMD 7950X3D or Intel 13900-KS

1

light24bulbs t1_jc5e0zk wrote

yeah theres definitely a threshold in there where its fast enough for human interaction. It's only an order of magnitude off, that's not too bad.

3