Someone suggested active learning, but it may be more suitable to look into the subfield of data valuation.
Data valuation broadly aims to assign values to data points that represent their contribution to a model’s overall performance. Many methods are based on game theoretic solution concepts such as the Shapley value and are therefore very expensive to compute. In practical settings, I would suggest the Shapley over kNN surrogate by Jia et al. (2019) or LAVA by Just et al. (2023).
Keiny t1_jd2vgtk wrote
Reply to [D] Determining quality of training images with some metrics by i_sanitize_my_hands
Someone suggested active learning, but it may be more suitable to look into the subfield of data valuation.
Data valuation broadly aims to assign values to data points that represent their contribution to a model’s overall performance. Many methods are based on game theoretic solution concepts such as the Shapley value and are therefore very expensive to compute. In practical settings, I would suggest the Shapley over kNN surrogate by Jia et al. (2019) or LAVA by Just et al. (2023).
You can find more papers at the GitHub repo awesome-data-valuation.
Hope that helps!