mostlyhydrogen OP t1_j7km5j2 wrote on February 7, 2023 at 2:09 PM

Reply to comment by YOLOBOT666 in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

Thanks for the offer! This is a work project, though. I'm working with images. I can't give too many details due to confidentiality, but we're sub-billion images scale.

Usability is determined by trained annotators. If they find an object of interest and want to harvest more training data, they do a reverse image search across the whole training data and tag true matches.

mostlyhydrogen OP t1_j7fydvb wrote on February 6, 2023 at 2:57 PM

Reply to comment by YOLOBOT666 in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

The goal is to harvest training data for ML. If there is a difficult edge case the model is struggling with, the best way to improve model performance is to harvest additional training data for that edge case. You stop when the model performance meets your requirements.

mostlyhydrogen OP t1_j7fxwyx wrote on February 6, 2023 at 2:53 PM

Reply to comment by RingoCatKeeper in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

>ScaNN interface features

Nope. Notice that the results have shape (10000, 20) instead of (20,). That is just doing a batched query i.e. "for each of these 10k input vectors, find me 20 neighbors". What I need is a joint query, i.e. "given these 10k positive examples, give me an additional 20 candidate samples".

mostlyhydrogen OP t1_j73k4xe wrote on February 3, 2023 at 8:36 PM

Reply to comment by YOLOBOT666 in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

Not exactly. I have millions of points, most of which are not related to my query vectors. I want to iteratively refine my search: search, mark results as "relevant" or "irrelevant", repeat search with updated query.

mostlyhydrogen OP t1_j727t8z wrote on February 3, 2023 at 3:30 PM

Reply to comment by Kacper-Lukawski in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

Thanks for the link!

mostlyhydrogen OP t1_j724ctr wrote on February 3, 2023 at 3:07 PM

Reply to comment by RingoCatKeeper in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

That was an interesting read, but I don't think it solves my problem. Their examples don't show joint vector searches: https://github.com/google-research/google-research/blob/master/scann/docs/example.ipynb

mostlyhydrogen OP t1_j723ya3 wrote on February 3, 2023 at 3:04 PM

Reply to comment by nobody202342 in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

What about marking samples as "irrelevant"?

mostlyhydrogen OP t1_j723us3 wrote on February 3, 2023 at 3:03 PM

Reply to comment by BiryaniSenpai in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

What does it mean for a vector to attend to another vector?

mostlyhydrogen OP t1_j7238p8 wrote on February 3, 2023 at 2:59 PM

Reply to comment by linverlan in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

As you probably know, ANN search often returns irrelevant data. How might I iteratively refine the search with human feedback: marking samples as "relevant" or "irrelevant" and repeating the search.

I've done a lit search and haven't found anything, maybe because I am using the wrong keywords.

mostlyhydrogen OP t1_j70koyk wrote on February 3, 2023 at 5:05 AM

Reply to comment by nobody202342 in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

No, because the embeddings are on a unit hypersphere. But taking the average vector on the surface of the hypersphere might work.