Submitted by SAbdusSamad t3_10siibd in MachineLearning
nicholsz t1_j728g2l wrote
Reply to comment by juanigp in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad
OS. Kernel. Bus. Processor. Transistor. p-n junction
juanigp t1_j73a6z4 wrote
It was my grain of sand, self attention is a bunch of matrix multiplications. 12 layers of the same, it makes sense to understand why QK^t. If the question would have been how to understand maskrcnn the answer would have been different.
Edit: 12 layers in ViT base / BERT base
Viewing a single comment thread. View all comments