🤖 AI Summary
Exact computation of Shapley values is generally infeasible in machine learning, and existing efficient approximation methods often lack theoretical guarantees—particularly, the mechanism underlying the effectiveness of paired sampling remains unclear. This work proposes OddSHAP, which for the first time reveals that paired sampling achieves accurate estimation by orthogonalizing the even component while preserving the odd component of the set function. Building on the key insight that Shapley values depend solely on the odd part of the set function, OddSHAP constructs a consistent estimator operating exclusively within the odd subspace. The method leverages Fourier bases to isolate this subspace and integrates a surrogate model to identify critical high-order interactions, thereby circumventing combinatorial explosion. Empirical evaluations across multiple benchmarks demonstrate that OddSHAP achieves state-of-the-art estimation accuracy.
📝 Abstract
The Shapley value is a ubiquitous framework for attribution in machine learning, encompassing feature importance, data valuation, and causal inference. However, its exact computation is generally intractable, necessitating efficient approximation methods. While the most effective and popular estimators leverage the paired sampling heuristic to reduce estimation error, the theoretical mechanism driving this improvement has remained opaque. In this work, we provide an elegant and fundamental justification for paired sampling: we prove that the Shapley value depends exclusively on the odd component of the set function, and that paired sampling orthogonalizes the regression objective to filter out the irrelevant even component. Leveraging this insight, we propose OddSHAP, a novel consistent estimator that performs polynomial regression solely on the odd subspace. By utilizing the Fourier basis to isolate this subspace and employing a proxy model to identify high-impact interactions, OddSHAP overcomes the combinatorial explosion of higher-order approximations. Through an extensive benchmark evaluation, we find that OddSHAP achieves state-of-the-art estimation accuracy.