MorallyDeplorable t1_jc32jfw wrote on March 13, 2023 at 6:06 PM

Reply to comment by 3deal in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

It should, yea. I'm running it on a 4090 which has the same amount of VRAM. It takes about 20-21 GB of RAM.

MorallyDeplorable t1_jc1umt7 wrote on March 13, 2023 at 1:03 PM

Reply to comment by Necessary_Ad_9800 in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

I'm not actually sure. I've just been chatting with people in an unrelated Discord's off topic channel about it.

I'd post some of what I've got from it but I have no idea what I'm doing with it and don't think what I'm getting would be decently representative of what it can actually do.

MorallyDeplorable t1_jc0tuwg wrote on March 13, 2023 at 5:19 AM

Reply to comment by 3deal in [P] Discord Chatbot for LLaMA 4-bit quantized that runs 13b in <9 GiB VRAM by Amazing_Painter_7692

It got leaked, not officially released. I have 30B 4 bit running here.