Submitted by RingoCatKeeper t3_zypzrv in MachineLearning
hermlon t1_j295dp1 wrote
This is a really cool idea. I'm currently using the CLIP model for an image retrieval task at university. We're using the Ball Tree for finding the closest images to the text in the vector space. What algorithm are you using for finding the nearest neighbors?
RingoCatKeeper OP t1_j295uu0 wrote
I'm using the simple cosine similarity between embedding vectors. There were some optimized work by Google called ScanNN, which is much faster on large scale vector similarity search. However, it's much more complicated to port this model to iOS.
hermlon t1_j29a939 wrote
So you go trough all the images each time and compute the cosine similarity between it and the text each time?
RingoCatKeeper OP t1_j29ah9n wrote
Right.
Viewing a single comment thread. View all comments