The trouble is that contrastive methods often compare elements from the same batch, instead of treating elements as independent like pretty much all other ML (except batchnorm).
As a simple example with a really weird version of contrastive learning: with a batch of 2N, contrastive learning might use the 4N^2 distances between batch elements to calculate a loss, while with two accumulated batches of N, contrastive learning could only use 2N^2 pairs for loss.
Paedor t1_j5ur6tx wrote
Reply to comment by altmly in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
The trouble is that contrastive methods often compare elements from the same batch, instead of treating elements as independent like pretty much all other ML (except batchnorm).
As a simple example with a really weird version of contrastive learning: with a batch of 2N, contrastive learning might use the 4N^2 distances between batch elements to calculate a loss, while with two accumulated batches of N, contrastive learning could only use 2N^2 pairs for loss.