currentscurrents t1_jd10ab5 wrote on March 21, 2023 at 1:18 AM

Llamma.cpp uses the neural engine, so does StableDiffusion. And the speed is not that far off from VRAM, actually.

>Memory bandwidth is increased to 800GB/s, more than 10x the latest PC desktop chip, and M1 Ultra can be configured with 128GB of unified memory.

Apple is clearly positioning their devices for AI.

Straight-Comb-6956 t1_jd2iwp6 wrote on March 21, 2023 at 11:30 AM

> Llamma.cpp uses the neural engine,

Does it?

no, llama-mps use ane.

> Llamma.cpp uses the neural engine

I am trying to find confirmation for this but I didn't. I saw some ports, but weren't from the LLaMa team. Do you have any source?