FallUpJV t1_jcje89y wrote on March 17, 2023 at 7:10 AM

I don't get this anymore if it's not the model size nor the transformer architecture then what is it?

Models were just not trained enough / not on the right data?

xEdwin23x t1_jcjfnlj wrote on March 17, 2023 at 7:31 AM

First, this is not a "small" model so size DOES matter. It may not be hundreds billion parameters but it's definitely not small imo.

Second, it always has been (about data) astronaut pointing gun meme.

FallUpJV t1_jclpydo wrote on March 17, 2023 at 7:09 PM

Yes it's definitely not small, I meant comparated to the models people have been paying attention to the most on the last few years I guess.

The astronaut pointing gun meme is a good analogy, almost a scary one, I wonder how much we could improve existing models with simply better data.

MysteryInc152 t1_jclpjzi wrote on March 17, 2023 at 7:07 PM

It's predicting language. as long as the structure can allow properly to learn to predict language, you're good to go.

turnip_burrito t1_jcoul9i wrote on March 18, 2023 at 12:41 PM

Yes, exactly. Everyone keeps leaving the architecture's inductive structural priors out of the discussion.

It's not all about data! The model matters too!

satireplusplus t1_jcp6bu4 wrote on March 18, 2023 at 2:20 PM

This model uses a "trick" to efficiently train RNNs at scale and I still I have to take a look to understand how it works. Hopefully the paper is out soon!

Otherwise size is what matters! To get there it's a combination of factors - the transformer architecture scales well and was the first architecture that allowed to train these LLMs cranked up to enormous sizes. Enterprise GPU hardware with lots of memory (40G, 80G) and frameworks like pytorch that make parallelizing training across multiple GPUs easy.

And OPs 14B model might be "small" by today's standard, but its still gigantic compared to a few years ago. It's ~27GB of FP16 weights.

Having access to 1TB of preprocessed text data that you can download right away without doing your own crawling is also neat (pile).