The_frozen_one t1_jbzqvwc wrote
Reply to comment by remghoost7 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692
I'm running it using https://github.com/ggerganov/llama.cpp. The 4-bit version of 13b runs ok without GPU acceleration.
remghoost7 t1_jbzro03 wrote
Nice!
How's the generation speed...?
The_frozen_one t1_jbzv0gt wrote
It takes about 7 seconds to generate a full response using 13B to a prompt with the default (128) number of predicted tokens.
Viewing a single comment thread. View all comments