Submitted by shingekichan1996 t3_10ky2oh in MachineLearning
This thread is dedicated to exploring the various techniques used in self-supervised contrastive learning that utilize standard batch sizes. I am seeking information on the current methods in this field, specifically those that do not rely on large batch sizes.
I am familiar with the SimSiam paper published by META research, which utilizes 256 batch size for 8-GPUs. However, for individuals with limited resources such as myself, access to a large number of GPUs may not be feasible. As a result, I am interested in learning about other methods that can be used with smaller batch sizes and a single GPU, such as those that would be suitable for training on 1024x1024 input images.
Additionally, I am curious about any more efficient architectures that have been developed in this field. This includes, but is not limited to, techniques used in natural language processing that may have applications in other areas of artificial intelligence.
***posted the same question in PyTorch forums, reposting here for wider reach.
IntelArtiGen t1_j5tijjx wrote
I managed to use SwAV on 1 GPU (8GB), batch size 240, 224x224 images, FP16, ResNet18.
Of course it works, the problem isn't just the batch size but the accuracy - batchsize trade-off, and the accuracy was quite bad (still usable for my task though). If 50% top5 on imagenet is ok for you, you can do it. But I'm not sure there are many tasks where it makes sense.
Perhaps contrastive learning isn't the best for single GPU. I'm not sure about the current SOTA on this task.