Submitted by Vegetable-Skill-9700 t3_121agx4 in deeplearning
Jaffa6 t1_jdlk3j9 wrote
There was a paper a while back (Chinchilla?) that indicated that for the best results, model size and the amount of data you give it should grow proportionally and that many then-SotA models were undertrained in terms of how much data they were given. You might find it interesting.
But as a tangent, I think ML focuses too much on chasing accuracy. You see it constantly in SotA papers where they're claiming things like "We improved our GLUE score by 0.1 compared to SotA, and all it took was spending the GDP of Switzerland on electricity and GPUs!"
And it's still a model that hallucinates way too much, contains bias, and just generally isn't worth all that time, money, and pollution.
Vegetable-Skill-9700 OP t1_jdpr474 wrote
Thanks for sharing! It's a great read, I agree most of the current models most likely be under-trained.
Jaffa6 t1_jdq1rua wrote
It's worth noting that some models were designed according to this once it came out, and I believe it did have some impact in the community, but yeah wouldn't surprise me if it's still a problem.
Glad you liked it!
Viewing a single comment thread. View all comments