Viewing a single comment thread. View all comments

currentscurrents t1_jd10ab5 wrote

Llamma.cpp uses the neural engine, so does StableDiffusion. And the speed is not that far off from VRAM, actually.

>Memory bandwidth is increased to 800GB/s, more than 10x the latest PC desktop chip, and M1 Ultra can be configured with 128GB of unified memory.

By comparison, the Nvidia 4090 is clocking in at ~1000GB/s

Apple is clearly positioning their devices for AI.

14

pier4r t1_jd39md4 wrote

> Llamma.cpp uses the neural engine

I am trying to find confirmation for this but I didn't. I saw some ports, but weren't from the LLaMa team. Do you have any source?

1