BellyDancerUrgot t1_jdx6w01 wrote 2 years ago

Reply to comment by ChingChong--PingPong in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700

The implication was, most of accessible textual data. Which is true. The exaggeration was such cuz it’s a language model first and foremost and previous iterations like gpt3 and 3.5 were not multimodal. Also , as far as accounts go, that’s a huge ‘?’ atm. Especially going by tweets like these

https://twitter.com/katecrawford/status/1638524011876433921?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q

The reality is , we and you don’t have the slightest clue regarding what it was trained on and msft has sufficient compute to train on all of the text data on the internet.

When it comes to multimodal media we don’t really need to train a model on the same amount of data required for text.