ortegaalfredo

ortegaalfredo t1_je5urre wrote

I run a discord with all models. Currently only 30B and 65B because nobody uses the smaller LLMs.

Even if superficially they both can answer questions, in complex topics 65B is much better than 30B, so not even compares with 7B.

11

ortegaalfredo OP t1_jbaaqv5 wrote

The most important thing is to create a multi-process quantization to int8, this will allow it to work with 4X3090 GPU cards. Now it requires 8X3090 GPUs and its way over my budget.

Or just wait some days, I'm told some guys have 2xA100 cards and they will open a 65B model to the public this week.

11