Viewing a single comment thread. View all comments

SrPeixinho t1_jb96nyt wrote

Can I donate or help somehow to make it 65B?

11

ortegaalfredo OP t1_jbaaqv5 wrote

The most important thing is to create a multi-process quantization to int8, this will allow it to work with 4X3090 GPU cards. Now it requires 8X3090 GPUs and its way over my budget.

Or just wait some days, I'm told some guys have 2xA100 cards and they will open a 65B model to the public this week.

11

SpaceCockatoo t1_jblj2so wrote

4bit quant already out

2

ortegaalfredo OP t1_jbov7dl wrote

Tried the 8bit, 4bit for some reason don't work yet for me.

Problem is, those are very very slow, about 1 token/sec, compared with 13B I'm getting 100 tokens/s

1