Maximum OP t1_jbdqy5c wrote on March 8, 2023 at 8:47 AM

Reply to comment by adt in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__

Thanks for the links. Looks like RoBERTa did not gain a lot from the additional trainings, only minor improvements, but yeah, it was a tiny model. How was this not a good lesson? Why did people need Chinchilla? Maybe it's just having a lot of data comes easy so people gather as much as possible, even though they know they will go maximum 1 epoch over it.