The_frozen_one t1_jbzqvwc wrote on March 12, 2023 at 11:49 PM

I'm running it using https://github.com/ggerganov/llama.cpp. The 4-bit version of 13b runs ok without GPU acceleration.

remghoost7 t1_jbzro03 wrote on March 12, 2023 at 11:55 PM

Nice!

How's the generation speed...?

It takes about 7 seconds to generate a full response using 13B to a prompt with the default (128) number of predicted tokens.