Submitted by Secure-Technology-78 t3_10mdhxb in MachineLearning
starfries t1_j6l0aeq wrote
Reply to comment by anony_sci_guy in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
Thanks for that resource, I've been experimenting with the lottery ticket method but that's a lot of papers I haven't seen! Did you initialize the weights as if training from scratch, or did you do something like trying to match the variance of the old and new weights? I'm intrigued that your method didn't hurt performance - most of the things I've tested were detrimental to the network. I have seen some performance improvements under different conditions but I'm still trying to rule out any confounding factors.
anony_sci_guy t1_j6mr4k6 wrote
Glad it helped! The first thing I tried was just to re-initialize just like at the beginning of training, but I don't remember how much I dug into trying to modify it before moving on. That's great your seeing some improvements though! Would love to hear how the rest of your experiment goes!! =)
Viewing a single comment thread. View all comments