Submitted by External_Oven_6379 t3_yd0549 in MachineLearning
Dear-Acanthisitta698 t1_itpqu2j wrote
Reply to comment by External_Oven_6379 in Combining image and text embedding [P] by External_Oven_6379
I think the problem is concatenating visual and text feature. While dim of text feature is a lot smaller than visual feature, these information might be white out. So you may following LastVariation 's ideas (first get images with same categories then search within them) or scale up the text vector (maybe multiply 80, this is a hyperparmeter).
Viewing a single comment thread. View all comments