Viewing a single comment thread. View all comments

qrayons t1_jcbi3sk wrote

I mean, Karpathy was kind of right. This is from his original post.

> I’ve seen some arguments that all we need is lots more data from images, video, maybe text and run some clever learning algorithm: maybe a better objective function, run SGD, maybe anneal the step size, use adagrad, or slap an L1 here and there and everything will just pop out. If we only had a few more tricks up our sleeves! But to me, examples like this illustrate that we are missing many crucial pieces of the puzzle and that a central problem will be as much about obtaining the right training data in the right form to support these inferences as it will be about making them.

One of the crucial pieces we were missing was attention. So much of the advancement we are seeing now is because of transformers.

13