Keiny t1_jd2vgtk wrote on March 21, 2023 at 1:24 PM

Someone suggested active learning, but it may be more suitable to look into the subfield of data valuation.

Data valuation broadly aims to assign values to data points that represent their contribution to a model’s overall performance. Many methods are based on game theoretic solution concepts such as the Shapley value and are therefore very expensive to compute. In practical settings, I would suggest the Shapley over kNN surrogate by Jia et al. (2019) or LAVA by Just et al. (2023).

You can find more papers at the GitHub repo awesome-data-valuation.

Hope that helps!

i_sanitize_my_hands OP t1_jd35963 wrote on March 21, 2023 at 2:35 PM

Oh I didn't think of going down game theory route. Cool, thanks !!