Submitted by jsonathan t3_106q6m9 in MachineLearning
LetterRip t1_j3n91mt wrote
Reply to comment by IshKebab in [P] I built Adrenaline, a debugger that fixes errors and explains them with GPT-3 by jsonathan
I'd do GLM-130B
> With INT4 quantization, the hardware requirements can further be reduced to a single server with 4 * RTX 3090 (24G) with almost no performance degradation.
https://github.com/THUDM/GLM-130B
I'd also look into pruning/distillation and you could probably shrink the model by about half again.
Viewing a single comment thread. View all comments