VenerableSpace_ t1_ixxy2dy wrote on November 27, 2022 at 6:18 AM

Reply to comment by anonymousTestPoster in [P] Metric learning: theory, practice, code examples by Zestyclose-Check-751

Metric learning isnt really a new buzz word, its been in use for these types of approaches for several years now. Its a good framework to collectively think about these approaches but there is some overlap; eg. self-attention can be viewed as a form of metric learning as a stand-alone layer, eg. in ViT How to relate the input patch embeddings to one another s.t we can discriminate between the classes?

VenerableSpace_ t1_irzdgb9 wrote on October 12, 2022 at 4:11 AM

Reply to [D] Looking for some critiques on recent development of machine learning by fromnighttilldawn

RemindMe! 1 month

VenerableSpace_ t1_iqorbnu wrote on October 1, 2022 at 11:53 PM

Reply to comment by Lugi in [D] Focal loss - why it scales down the loss of minority class? by Lugi

Ahh I see now, its been a while since I read that paper. So they chalk it down to the interaction between alpha and the focal term. You can see how they need to use a non-intuitive value for alpha when they introduce the focal loss term in tab. 1b. especially when gamma > 0.5

VenerableSpace_ t1_iqocr2s wrote on October 1, 2022 at 10:00 PM

Reply to comment by Lugi in [D] Focal loss - why it scales down the loss of minority class? by Lugi

the alpha term uses inverse class freq to downweight the loss. So if there is 3:1 ratio of majority:minority, alpha_majority = 0.25 and alpha_minority = 0.75.

VenerableSpace_ t1_iqo5opu wrote on October 1, 2022 at 9:07 PM

Reply to [D] Focal loss - why it scales down the loss of minority class? by Lugi

Focal loss downweights "well-classified" examples. It happens that the minority class typically is not well classified because in a given mini-batch the average gradient will be dominated by the majority class.

Technically focal loss downweights losses for all examples, it just happens to downweight the loss of well classified examples significantly more than non-well classified examples (I'm using this distinction between the two but its a smoother downweighting).