π€ AI Summary
This work addresses the limitation of traditional data valuation methods in capturing the value of sample distributions by proposing the first distribution-level data valuation framework grounded in generalized Bayesian inference. The approach introduces a transferability-based loss function that unifies diverse tasks such as annotator evaluation and data augmentation, and naturally extends to continuous data stream settings. Its key innovation lies in applying generalized Bayesian inference at the distribution level, enabling efficient adaptation to dynamic and heterogeneous data environments. Extensive experiments across multiple real-world scenarios demonstrate the frameworkβs effectiveness and strong generalization capability.
π Abstract
We investigate the data distribution valuation problem, which aims to quantify the values of data distributions from their samples. This is a recently proposed problem that is related to but different from classical data valuation and can be applied to various applications. For this problem, we develop a novel framework called Generalized Bayes Valuation that utilizes generalized Bayesian inference with a loss constructed from transferability measures. This framework allows us to solve, in a unified way, seemingly unrelated practical problems, such as annotator evaluation and data augmentation. Using the Bayesian principles, we further improve and enhance the applicability of our framework by extending it to the continuous data stream setting. Our experiment results confirm the effectiveness and efficiency of our framework in different real-world scenarios.