Submitted by starstruckmon t3_1027geh in MachineLearning
visarga t1_j2yvpjs wrote
Reply to comment by cdsmith in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
Recent papers showed even small models under 10B can benefit from training on multi-task data. Learning to solve a large number of tasks works even when the model is not over 60B.
But no model comes even at 50% of GPT-3's scores, not including closed models.
Viewing a single comment thread. View all comments