Submitted by starstruckmon t3_1027geh in MachineLearning
learn-deeply t1_j2u53ek wrote
Reply to comment by bloc97 in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
My unsubstantiated hypothesis: BLOOM is severely undertrained, so most neurons aren't contributing at all to the final result compared to OPT-175.
ElectronicCress3132 t1_j2v4vy4 wrote
Could you elaborate what you mean by undertrained?
learn-deeply t1_j2vac5q wrote
The model hasn't reached convergence, and/or the train dataset was too small.
Viewing a single comment thread. View all comments