Viewing a single comment thread. View all comments

hermlon t1_j295dp1 wrote

This is a really cool idea. I'm currently using the CLIP model for an image retrieval task at university. We're using the Ball Tree for finding the closest images to the text in the vector space. What algorithm are you using for finding the nearest neighbors?

1

RingoCatKeeper OP t1_j295uu0 wrote

I'm using the simple cosine similarity between embedding vectors. There were some optimized work by Google called ScanNN, which is much faster on large scale vector similarity search. However, it's much more complicated to port this model to iOS.

1

hermlon t1_j29a939 wrote

So you go trough all the images each time and compute the cosine similarity between it and the text each time?

1