VenerableSpace_

VenerableSpace_ t1_ixxy2dy wrote

Metric learning isnt really a new buzz word, its been in use for these types of approaches for several years now. Its a good framework to collectively think about these approaches but there is some overlap; eg. self-attention can be viewed as a form of metric learning as a stand-alone layer, eg. in ViT How to relate the input patch embeddings to one another s.t we can discriminate between the classes?

2

VenerableSpace_ t1_iqo5opu wrote

Focal loss downweights "well-classified" examples. It happens that the minority class typically is not well classified because in a given mini-batch the average gradient will be dominated by the majority class.

Technically focal loss downweights losses for all examples, it just happens to downweight the loss of well classified examples significantly more than non-well classified examples (I'm using this distinction between the two but its a smoother downweighting).

3