Viewing a single comment thread. View all comments

Keiny t1_jd2vgtk wrote

Someone suggested active learning, but it may be more suitable to look into the subfield of data valuation.

Data valuation broadly aims to assign values to data points that represent their contribution to a model’s overall performance. Many methods are based on game theoretic solution concepts such as the Shapley value and are therefore very expensive to compute. In practical settings, I would suggest the Shapley over kNN surrogate by Jia et al. (2019) or LAVA by Just et al. (2023).

You can find more papers at the GitHub repo awesome-data-valuation.

Hope that helps!

1