Submitted by ortegaalfredo t3_11kr20f in MachineLearning
SrPeixinho t1_jb96nyt wrote
Can I donate or help somehow to make it 65B?
ortegaalfredo OP t1_jbaaqv5 wrote
The most important thing is to create a multi-process quantization to int8, this will allow it to work with 4X3090 GPU cards. Now it requires 8X3090 GPUs and its way over my budget.
Or just wait some days, I'm told some guys have 2xA100 cards and they will open a 65B model to the public this week.
SpaceCockatoo t1_jblj2so wrote
4bit quant already out
ortegaalfredo OP t1_jbov7dl wrote
Tried the 8bit, 4bit for some reason don't work yet for me.
Problem is, those are very very slow, about 1 token/sec, compared with 13B I'm getting 100 tokens/s
Viewing a single comment thread. View all comments