[D] Do we really need 100B+ parameters in a large language model? Submitted by Vegetable-Skill-9700 t3_121a8p4 on March 25, 2023 at 4:14 AM in MachineLearning 84 comments 101
Zealousideal_Low1287 t1_jdlm2c0 wrote on March 25, 2023 at 8:36 AM It seems that contrary to conventional wisdom, models with more parameters learn more efficiently. My personal ‘hunch’ is that training large models and then some form of distillation may become the standard thing to do. Permalink 7
Viewing a single comment thread. View all comments