itsnotlupus t1_j2tbhzu wrote on January 3, 2023 at 8:28 PM

Can you prune a pruned model? And then prune that again?

There's apparently no retraining needed here. Just loop over the matrices and shrink them (although it'd be nicer if there was a code repo to actually see that in action.)

I get that each successive pruning is going to make things increasingly worse, but I'm wondering if this might mean you can take an OPT-175B model and shrink it down in size to fit on commodity hardware like OPT-6.7B while still being closer in performance to the larger initial model than to the natively smaller model.

cdsmith t1_j2uzks4 wrote on January 4, 2023 at 3:09 AM

The idea is that there's an inflection point: at first you are mainly removing (masking with zeros) dimensions whose values are extremely small anyway and don't make much difference in the response, so you don't lose much accuracy. But after you're removed those dimensions, the remaining dimensions are specifically the ones that do matter, so you can't just go find more non-impactful dimensions again. They are already gone.

As far as what would happen if you over-pruned a model trained on a large number of parameters, I'd naively expect it to do much worse. If you train on more parameters and then zero out significant weights, then not only do you have a lower-dimensional space to model in (which is unavoidable), but you also lose out on the information that was correlated with the dimensions you've captured, because at training time your model relied on the parameters you have now zeroed out to capture that information.

visarga t1_j2yvpjs wrote on January 4, 2023 at 10:18 PM

Recent papers showed even small models under 10B can benefit from training on multi-task data. Learning to solve a large number of tasks works even when the model is not over 60B.

But no model comes even at 50% of GPT-3's scores, not including closed models.

drooobie t1_j2tgxnh wrote on January 3, 2023 at 9:01 PM

It's probably approximately idempotent.