juanigp t1_j73a6z4 wrote on February 3, 2023 at 7:33 PM

Reply to comment by nicholsz in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

It was my grain of sand, self attention is a bunch of matrix multiplications. 12 layers of the same, it makes sense to understand why QK^t. If the question would have been how to understand maskrcnn the answer would have been different.

Edit: 12 layers in ViT base / BERT base

juanigp t1_j71p88u wrote on February 3, 2023 at 1:13 PM

Reply to [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

matrix multiplication, linear projections, dot product

juanigp t1_j5swklo wrote on January 25, 2023 at 9:36 AM

Reply to comment by Jack7heRapper in [D] CVPR Reviews are out by banmeyoucoward

This was my first submission, I had a worse score than you but will write a rebuttal either ways (although I doubt I can convince everyone). Why wouldn't you? I'm not judging, asking out of curiosity as I don't know the "common practice" .

juanigp t1_ixgzh0t wrote on November 23, 2022 at 11:07 AM

Reply to [D] Am I stupid for avoiding high level frameworks? by bigbossStrife

I was doing everything PyTorch and then I switched to lightning to accomplish my goal easier, and you still have room for "low level" (with 1000 quote marks) development.

I think that implementing feature X, and advancing with my research/work are two different, maybe equally exciting tasks, and keeping them separate is more productive. If you dont, then you would end up implementing something similar to a lightning/mmcv/etc clone and hey, they already exist!