Do we really need 100B+ parameters in a large language model? Submitted by Vegetable-Skill-9700 t3_121agx4 on March 25, 2023 at 4:24 AM in deeplearning 54 comments 43
fysmoe1121 t1_jdm2mtw wrote on March 25, 2023 at 12:16 PM deep double descent => bigger is better Permalink 2
Viewing a single comment thread. View all comments