Viewing a single comment thread. View all comments

r2m2 t1_j64uah5 wrote

Isn’t this a (somewhat) well-known “free lunch” effect w/ naive one-shot magnitude pruning? I feel like this is a folklore fact for many models like ResNet/VGG (& a paper from a few years back validated the same for BERT)

2