oilfee t1_j2o8mba wrote on January 2, 2023 at 8:14 PM

Reply to comment by v2thegreat in [D] Simple Questions Thread by AutoModerator

I'm interested in numbers, not "it depends". How much data in bytes or tokens would I need for

- text generation

- image generation

- sound generation

- function classes

- protein sequences

- chess games

to achieve some sort of saturation of learnability, like diminishing return for a given architecture? Is it the same ball park? Have different data set sizes been compared with different model sizes?

oilfee t1_j2o7duv wrote on January 2, 2023 at 8:06 PM

Reply to comment by v2thegreat in [D] Simple Questions Thread by AutoModerator

I do theoretical stuff. It doesn't really matter, I'm not going to sink a million TPU hours into it.

oilfee t1_j2o5l66 wrote on January 2, 2023 at 7:54 PM

Reply to comment by v2thegreat in [D] Simple Questions Thread by AutoModerator

Can the question not be answered without knowing the data? I thought transformers are quite agnostic (as long as it is sequential in some dimension)?

oilfee t1_j2mki8y wrote on January 2, 2023 at 1:04 PM

Reply to [D] Simple Questions Thread by AutoModerator

How much data do I need for a transformer model? If I'm not mistaken, GPT-3 uses something like 50 PB of text? But maybe it gets 'decent' result with much less? I just don't want to fall into the trap of the small business owner who hires a data scientist and asks her to use deep learning for their 130-entry customer data base (which I've encountered before). But like, 1M tokens? 10M?