hermlon t1_j295dp1 wrote on December 30, 2022 at 3:32 PM

This is a really cool idea. I'm currently using the CLIP model for an image retrieval task at university. We're using the Ball Tree for finding the closest images to the text in the vector space. What algorithm are you using for finding the nearest neighbors?

RingoCatKeeper OP t1_j295uu0 wrote on December 30, 2022 at 3:36 PM

I'm using the simple cosine similarity between embedding vectors. There were some optimized work by Google called ScanNN, which is much faster on large scale vector similarity search. However, it's much more complicated to port this model to iOS.

hermlon t1_j29a939 wrote on December 30, 2022 at 4:05 PM

So you go trough all the images each time and compute the cosine similarity between it and the text each time?

RingoCatKeeper OP t1_j29ah9n wrote on December 30, 2022 at 4:07 PM

Right.