Submitted by AutoModerator t3_100mjlp in MachineLearning
oilfee t1_j2mki8y wrote
How much data do I need for a transformer model? If I'm not mistaken, GPT-3 uses something like 50 PB of text? But maybe it gets 'decent' result with much less? I just don't want to fall into the trap of the small business owner who hires a data scientist and asks her to use deep learning for their 130-entry customer data base (which I've encountered before). But like, 1M tokens? 10M?
v2thegreat t1_j2o58pj wrote
Why do you need a transformer model? What problem are you trying to solve?
oilfee t1_j2o5l66 wrote
Can the question not be answered without knowing the data? I thought transformers are quite agnostic (as long as it is sequential in some dimension)?
v2thegreat t1_j2o5v0y wrote
It can, but I want to know why you want to use transformers in the first place. Having the entire context is important to avoid solving the wrong problem, especially one that might get expensive depending on what you're trying to do
oilfee t1_j2o7duv wrote
I do theoretical stuff. It doesn't really matter, I'm not going to sink a million TPU hours into it.
v2thegreat t1_j2o816v wrote
Well, to answer your original question: it depends on what problem you're trying to solve!
In theory yes you can work with a large corpus of data with a large language model, but as chatgpt showed us, it's not necessarily the case that a larger model will do better always, but rather that fine-tuning might give better results
I hope this helps!
oilfee t1_j2o8mba wrote
I'm interested in numbers, not "it depends". How much data in bytes or tokens would I need for
- text generation
- image generation
- sound generation
- function classes
- protein sequences
- chess games
to achieve some sort of saturation of learnability, like diminishing return for a given architecture? Is it the same ball park? Have different data set sizes been compared with different model sizes?
v2thegreat t1_j2oablu wrote
For transformers that's likely a difficult question to answer without experimentation, but I always recommend to start small. It's generally hard enough to go from 0 to 1 without also worrying about scaling things up.
Currently, we're seeing that larger and larger models aren't really slowing down and continue to become more powerful.
I'd say that this deserves it's own post rather than a simple question.
Good luck and please respond when you end up solving it!
Viewing a single comment thread. View all comments