Viewing a single comment thread. View all comments

visarga t1_j2yvpjs wrote on January 4, 2023 at 10:18 PM

Reply to comment by cdsmith in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

Recent papers showed even small models under 10B can benefit from training on multi-task data. Learning to solve a large number of tasks works even when the model is not over 60B.

But no model comes even at 50% of GPT-3's scores, not including closed models.