learn-deeply t1_j2u53ek wrote on January 3, 2023 at 11:33 PM

My unsubstantiated hypothesis: BLOOM is severely undertrained, so most neurons aren't contributing at all to the final result compared to OPT-175.

ElectronicCress3132 t1_j2v4vy4 wrote on January 4, 2023 at 3:50 AM

Could you elaborate what you mean by undertrained?

The model hasn't reached convergence, and/or the train dataset was too small.