Submitted by mostlyhydrogen t3_10rvkru in MachineLearning
YOLOBOT666 t1_j72cncj wrote
Iterative as in continuing until there’s no more neighbours left as you continuously add neighbours to your index and query?
mostlyhydrogen OP t1_j73k4xe wrote
Not exactly. I have millions of points, most of which are not related to my query vectors. I want to iteratively refine my search: search, mark results as "relevant" or "irrelevant", repeat search with updated query.
YOLOBOT666 t1_j75i86x wrote
Out of curiosity, what are you trying to achieve as in when is the iterative process going to stop, what would be the heuristics? Would appreciate if you could share some papers for this!
mostlyhydrogen OP t1_j7fydvb wrote
The goal is to harvest training data for ML. If there is a difficult edge case the model is struggling with, the best way to improve model performance is to harvest additional training data for that edge case. You stop when the model performance meets your requirements.
YOLOBOT666 t1_j7iov1k wrote
Nice! I guess the heuristic part is how you use the queries at every iteration and make it “usable” in your iterative approach. What’s the size and dimension of your dataset? These graph-based ANNs are memory intensive, wondering what can you do for your dimensions?
If it’s a public repo/planning to release it on GitHub, I’d be happy to join!
mostlyhydrogen OP t1_j7km5j2 wrote
Thanks for the offer! This is a work project, though. I'm working with images. I can't give too many details due to confidentiality, but we're sub-billion images scale.
Usability is determined by trained annotators. If they find an object of interest and want to harvest more training data, they do a reverse image search across the whole training data and tag true matches.
Viewing a single comment thread. View all comments