SrPeixinho t1_jb96nyt wrote on March 7, 2023 at 11:07 AM

Can I donate or help somehow to make it 65B?

ortegaalfredo OP t1_jbaaqv5 wrote on March 7, 2023 at 4:38 PM

The most important thing is to create a multi-process quantization to int8, this will allow it to work with 4X3090 GPU cards. Now it requires 8X3090 GPUs and its way over my budget.

Or just wait some days, I'm told some guys have 2xA100 cards and they will open a 65B model to the public this week.

SpaceCockatoo t1_jblj2so wrote on March 9, 2023 at 10:13 PM

4bit quant already out

ortegaalfredo OP t1_jbov7dl wrote on March 10, 2023 at 4:23 PM

Tried the 8bit, 4bit for some reason don't work yet for me.

Problem is, those are very very slow, about 1 token/sec, compared with 13B I'm getting 100 tokens/s