qrayons t1_jcbi3sk wrote on March 15, 2023 at 5:27 PM

I mean, Karpathy was kind of right. This is from his original post.

> I’ve seen some arguments that all we need is lots more data from images, video, maybe text and run some clever learning algorithm: maybe a better objective function, run SGD, maybe anneal the step size, use adagrad, or slap an L1 here and there and everything will just pop out. If we only had a few more tricks up our sleeves! But to me, examples like this illustrate that we are missing many crucial pieces of the puzzle and that a central problem will be as much about obtaining the right training data in the right form to support these inferences as it will be about making them.

One of the crucial pieces we were missing was attention. So much of the advancement we are seeing now is because of transformers.

ugohome t1_jcevnrm wrote on March 16, 2023 at 10:10 AM

>transformers

justlurkin7 t1_jcfil84 wrote on March 16, 2023 at 1:51 PM

Transformer is the T in GPT