Submitted by SAbdusSamad t3_10siibd in MachineLearning
Hello everyone,
I'm interested in diving into the field of computer vision and I recently came across the concept of Vision Transformer (ViT). I want to understand this concept in depth but I'm not sure what prerequisites I need to have in order to grasp the concept fully.
Do I need to have a strong background in Recurrent Neural Networks (RNNs) and Transformer (Attention Is All You Need) to understand ViT, or can I get by just knowing the basics of deep learning and Convolutional Neural Networks (CNNs)?
I would really appreciate if someone could shed some light on this and provide some guidance.
Thank you in advance!
the_architect_ai t1_j71izep wrote
I suggest you just dive straight in. Part of learning is to find out what you don’t know and slowly cover your bases from there.