juanigp t1_j71p88u wrote on February 3, 2023 at 1:13 PM

matrix multiplication, linear projections, dot product

nicholsz t1_j728g2l wrote on February 3, 2023 at 3:34 PM

OS. Kernel. Bus. Processor. Transistor. p-n junction

juanigp t1_j73a6z4 wrote on February 3, 2023 at 7:33 PM

It was my grain of sand, self attention is a bunch of matrix multiplications. 12 layers of the same, it makes sense to understand why QK^t. If the question would have been how to understand maskrcnn the answer would have been different.

Edit: 12 layers in ViT base / BERT base