Submitted by External_Oven_6379 t3_yd0549 in MachineLearning
NonFocusNorm t1_itq4vts wrote
I believe robust backbone models are very crucial since they are feature extractors and determine how good your embeddings are. So I suggest using CLIP from openAI, a very OP model that works well for zero-shot learning task. I personally use it and suprisingly outperform others in an text-image retrieval task, highly recommend you try it out.
Viewing a single comment thread. View all comments