Submitted by Vegetable-Skill-9700 t3_121agx4 in deeplearning
BellyDancerUrgot t1_jdx6w01 wrote
Reply to comment by ChingChong--PingPong in Do we really need 100B+ parameters in a large language model? by Vegetable-Skill-9700
The implication was, most of accessible textual data. Which is true. The exaggeration was such cuz it’s a language model first and foremost and previous iterations like gpt3 and 3.5 were not multimodal. Also , as far as accounts go, that’s a huge ‘?’ atm. Especially going by tweets like these
https://twitter.com/katecrawford/status/1638524011876433921?s=46&t=kwpwSgfnJvGe6J-1CEe_5Q
The reality is , we and you don’t have the slightest clue regarding what it was trained on and msft has sufficient compute to train on all of the text data on the internet.
When it comes to multimodal media we don’t really need to train a model on the same amount of data required for text.
Viewing a single comment thread. View all comments