Viewing a single comment thread. View all comments

SatisfyingLatte t1_ittn52b wrote

Once all the useful representations from the training data has been extracted and learned. Beyond that, increasing model size will overfit the training data. Only language tasks might be solvable by naively scaling current techniques.


ReasonablyBadass t1_ittryn7 wrote

Overfitting isn't an issue anymore due to the discovery of double descent/grokking.