Viewing a single comment thread. View all comments

squidward2022 t1_j5vmb95 wrote

(https://arxiv.org/pdf/2106.04156.pdf ) This was a cool paper from NeurIPS 2020 which aimed to theoretically explain the success of CL by relating it spectral clustering. They present a loss with a very similar form to InfoNCE, which they use for their theory. One of the plus sides found was it worked well with small batch sizes.

(https://arxiv.org/abs/2110.06848) I skimmed this work a while back, one of their main claims is that this approach works with small batch sizes.

2