_Ruffy_ t1_j635zdc wrote on January 27, 2023 at 11:29 AM

Good idea in principle, anyone know more about this or any references?

anony_sci_guy t1_j63nj0u wrote on January 27, 2023 at 2:12 PM

This was exactly my first thought too - free up all those extra parameters & re-randomize them. Problem could be that the re-randomized parameters will have a big gap in distribution between the pre-tuned and re-randomized weights, so you'd want different step sizes for them. I've played with it before & ran into this problem, but got too lazy to actually implement a solution. (I'm actually a biologist, so don't really have bandwidth to dig into the ML side as much)..

starfries t1_j64qhqa wrote on January 27, 2023 at 6:24 PM

Can you elaborate on this? I'm trying something similar, so I'm curious what your results were and if you ran across any literature about this idea.

anony_sci_guy t1_j681trq wrote on January 28, 2023 at 11:43 AM

Yeah, there is some stuff published out there. It's related to pruning (A link to a ton of papers on it); the lottery ticket method solves this one well, because you're re-training from scratch, just with "lucky" selection of the initialized weights. Results-wise, I never got anything to improve because of the distributional changes caused by trying to re-randomize a subset in the middle of training. Still saw the same level of performance as without re-randomizing, but that basically just showed that the way that I was re-randomizing wasn't helping or hurting b/c those neurons weren't important...

starfries t1_j6l0aeq wrote on January 31, 2023 at 1:51 AM

Thanks for that resource, I've been experimenting with the lottery ticket method but that's a lot of papers I haven't seen! Did you initialize the weights as if training from scratch, or did you do something like trying to match the variance of the old and new weights? I'm intrigued that your method didn't hurt performance - most of the things I've tested were detrimental to the network. I have seen some performance improvements under different conditions but I'm still trying to rule out any confounding factors.

anony_sci_guy t1_j6mr4k6 wrote on January 31, 2023 at 1:02 PM

Glad it helped! The first thing I tried was just to re-initialize just like at the beginning of training, but I don't remember how much I dug into trying to modify it before moving on. That's great your seeing some improvements though! Would love to hear how the rest of your experiment goes!! =)

ApprehensiveNature69 t1_j651pux wrote on January 27, 2023 at 7:34 PM

Yep! This is known technique - if you search for it lots of papers on sparse fine tuning show up, its a very valid technique.

JustOneAvailableName t1_j65eg5f wrote on January 27, 2023 at 8:55 PM

!RemindMe 3 days

RemindMeBot t1_j65eij7 wrote on January 27, 2023 at 8:56 PM

I will be messaging you in 3 days on 2023-01-30 20:55:47 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

[R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

nmfisher t1_j62y29r wrote on January 27, 2023 at 9:43 AM