🤖 AI Summary
Traditional Shapley value-based data valuation suffers from inaccuracy when applied to real-world datasets exhibiting heterogeneity and complex dependency structures, as it implicitly assumes data homogeneity and independence.
Method: This paper proposes a structure-aware asymmetric Data Shapley framework. It introduces the asymmetry axiom—the first formal incorporation of asymmetry into data value quantification—thereby relaxing classical Shapley assumptions. Leveraging k-nearest neighbor graphs to capture intrinsic data structure, we design the first exact algorithm that simultaneously provides theoretical guarantees (e.g., fairness, efficiency, structure-awareness) and computational tractability.
Contribution/Results: Extensive evaluation across diverse supervised learning tasks and data market scenarios demonstrates substantial improvements in contribution assessment accuracy and structural sensitivity. The open-source implementation has been widely adopted, establishing a new paradigm for data pricing and interpretable machine learning.
📝 Abstract
As data emerges as a vital driver of technological and economic advancements, a key challenge is accurately quantifying its value in algorithmic decision-making. The Shapley value, a well-established concept from cooperative game theory, has been widely adopted to assess the contribution of individual data sources in supervised machine learning. However, its symmetry axiom assumes all players in the cooperative game are homogeneous, which overlooks the complex structures and dependencies present in real-world datasets. To address this limitation, we extend the traditional data Shapley framework to asymmetric data Shapley, making it flexible enough to incorporate inherent structures within the datasets for structure-aware data valuation. We also introduce an efficient $k$-nearest neighbor-based algorithm for its exact computation. We demonstrate the practical applicability of our framework across various machine learning tasks and data market contexts. The code is available at: https://github.com/xzheng01/Asymmetric-Data-Shapley.