Viewing a single comment thread. View all comments

Zealousideal_Low1287 t1_jdlm2c0 wrote

It seems that contrary to conventional wisdom, models with more parameters learn more efficiently. My personal ‘hunch’ is that training large models and then some form of distillation may become the standard thing to do.

7