Viewing a single comment thread. View all comments

Zealousideal_Low1287 t1_jdlm2c0 wrote on March 25, 2023 at 8:36 AM

It seems that contrary to conventional wisdom, models with more parameters learn more efficiently. My personal ‘hunch’ is that training large models and then some form of distillation may become the standard thing to do.