Submitted by Qwillbehr t3_11xpohv in MachineLearning
KerfuffleV2 t1_jd7rjvf wrote
Reply to comment by Gatensio in [D] Running an LLM on "low" compute power machines? by Qwillbehr
There are quantized versions at 8bit and 4bit. The 4bit quantized 30B version is 18GB so it will run on a machine with 32GB RAM.
The bigger the model, the more tolerant it seems to quantization so even 1bit quantized models are in the realm of possibility (would probably have to be something like a 120B+ model to really work).
Viewing a single comment thread. View all comments