Submitted by ortegaalfredo t3_11kr20f in MachineLearning
SpaceCockatoo t1_jblj2so wrote
Reply to comment by ortegaalfredo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
4bit quant already out
ortegaalfredo OP t1_jbov7dl wrote
Tried the 8bit, 4bit for some reason don't work yet for me.
Problem is, those are very very slow, about 1 token/sec, compared with 13B I'm getting 100 tokens/s
Viewing a single comment thread. View all comments