ganzzahl t1_jdovu3h wrote on March 26, 2023 at 1:00 AM

Reply to comment by itshouldjustglide in [D] Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

I'm also very interested in this – does anyone have papers similar to Chinchilla, but without the training FLOPs restriction, and instead comparing identical dataset sizes?

An aside: I feel like I remember some older MT papers where LSTMs outperformed Transformers for some low resource languages, but I think that's outdated – using transfer learning, multilingual models and synthetic data, I'm fairly certain Transformers always outperform nowadays.

PilotThen t1_jdpnoul wrote on March 26, 2023 at 5:05 AM

I didn't find a paper but I think that is sort of what EleutherAI was doing with their pythia models.

You'll find the models on huggingface and I'd say that they are also interesting from an opensource perspective because of their license (apache-2.0)

(Also open-assistent seems to be building on top of them.)