Viewing a single comment thread. View all comments

visarga t1_itc1tbb wrote

U-PaLM and Flan PaLM come to mind.

The first one shows we were noising the data incorrectly, different kind of noising has major benefits. Second shows that training on thousands of tasks boosts the model capability to follow instructions and also get better score. So maybe OpenAI had to change their plans midway.

It's also possible that they don't want to scale up even further because it's very impractical. Too expensive to use, not just to train. And recent models like Flan get GPT-3 scores on many (not all) tasks with just 3B parameters.

There's also a question about training data - where can they get 10x or 100x more? I bet they transcribe videos, probably all videos that can be accessed. Another approach is to use raw audio instead of text, works well. I bet they have a large team just for dataset building. BLOOM was managed by an organisation with 1000 people and a lot of their effort was into the dataset sourcing, trying to reduce biases.

33