Viewing a single comment thread. View all comments

ortegaalfredo t1_je5urre wrote

I run a discord with all models. Currently only 30B and 65B because nobody uses the smaller LLMs.

Even if superficially they both can answer questions, in complex topics 65B is much better than 30B, so not even compares with 7B.

11

machineko t1_je86hwt wrote

What GPUs are you using to run them? Are you using any compression (i.e. quantization)?

1

ortegaalfredo t1_jegn9zu wrote

2x3090, 65B is using int4, 30B is using int8 (required for LoRA)

2