oilfee
oilfee t1_j2o7duv wrote
Reply to comment by v2thegreat in [D] Simple Questions Thread by AutoModerator
I do theoretical stuff. It doesn't really matter, I'm not going to sink a million TPU hours into it.
oilfee t1_j2o5l66 wrote
Reply to comment by v2thegreat in [D] Simple Questions Thread by AutoModerator
Can the question not be answered without knowing the data? I thought transformers are quite agnostic (as long as it is sequential in some dimension)?
oilfee t1_j2mki8y wrote
Reply to [D] Simple Questions Thread by AutoModerator
How much data do I need for a transformer model? If I'm not mistaken, GPT-3 uses something like 50 PB of text? But maybe it gets 'decent' result with much less? I just don't want to fall into the trap of the small business owner who hires a data scientist and asks her to use deep learning for their 130-entry customer data base (which I've encountered before). But like, 1M tokens? 10M?
oilfee t1_j2o8mba wrote
Reply to comment by v2thegreat in [D] Simple Questions Thread by AutoModerator
I'm interested in numbers, not "it depends". How much data in bytes or tokens would I need for
- text generation
- image generation
- sound generation
- function classes
- protein sequences
- chess games
to achieve some sort of saturation of learnability, like diminishing return for a given architecture? Is it the same ball park? Have different data set sizes been compared with different model sizes?