If you want to take the top down approach I'd recommend that you start by learning what transformers are. Transformers were originally intended for language modelling so if you look up a NLP lecture series like Stanford CS224n they cover that in detail form a NLP perspective, it should be helpful regardless. Or you can check out CS231n they have a whole lecture on attention, transformers and ViT. Start there and look up the stuff thats unclear from there.
Lmk of you'd like me to link any other resources, I'll edit this later. Happy learning!
atharvat80 t1_j71u3oa wrote
Reply to [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad
If you want to take the top down approach I'd recommend that you start by learning what transformers are. Transformers were originally intended for language modelling so if you look up a NLP lecture series like Stanford CS224n they cover that in detail form a NLP perspective, it should be helpful regardless. Or you can check out CS231n they have a whole lecture on attention, transformers and ViT. Start there and look up the stuff thats unclear from there.
Lmk of you'd like me to link any other resources, I'll edit this later. Happy learning!