Regression-adjusted Monte Carlo Estimators for Shapley Values and Probabilistic Values

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficient, unbiased estimation of probabilistic values—such as the Shapley value, Banzhaf value, and semivalues—in explainable AI remains challenging due to high computational complexity and reliance on restrictive linear regression assumptions. Method: This paper proposes a novel decoupled framework integrating Monte Carlo sampling with flexible regression modeling. It orthogonally combines any analytically tractable probabilistic value with expressive regression models (e.g., XGBoost), eliminating the need for linear approximations while preserving unbiasedness; variance reduction techniques are further incorporated to enhance estimation accuracy. Contribution/Results: Evaluated on eight benchmark datasets, our method reduces Shapley value estimation error by 6.5×, 3.8×, and 2.6× compared to Permutation SHAP, Kernel SHAP, and Leverage SHAP, respectively. For generalized probabilistic values, error reduction reaches up to 215×. The framework achieves a significant balance between computational efficiency and statistical precision, enabling scalable, model-agnostic, and theoretically sound attribution.

Technology Category

Application Category

📝 Abstract
With origins in game theory, probabilistic values like Shapley values, Banzhaf values, and semi-values have emerged as a central tool in explainable AI. They are used for feature attribution, data attribution, data valuation, and more. Since all of these values require exponential time to compute exactly, research has focused on efficient approximation methods using two techniques: Monte Carlo sampling and linear regression formulations. In this work, we present a new way of combining both of these techniques. Our approach is more flexible than prior algorithms, allowing for linear regression to be replaced with any function family whose probabilistic values can be computed efficiently. This allows us to harness the accuracy of tree-based models like XGBoost, while still producing unbiased estimates. From experiments across eight datasets, we find that our methods give state-of-the-art performance for estimating probabilistic values. For Shapley values, the error of our methods can be $6.5 imes$ lower than Permutation SHAP (the most popular Monte Carlo method), $3.8 imes$ lower than Kernel SHAP (the most popular linear regression method), and $2.6 imes$ lower than Leverage SHAP (the prior state-of-the-art Shapley value estimator). For more general probabilistic values, we can obtain error $215 imes$ lower than the best estimator from prior work.
Problem

Research questions and friction points this paper is trying to address.

Efficiently estimating Shapley and probabilistic values
Combining Monte Carlo sampling with regression techniques
Improving accuracy over existing Shapley value estimators
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines Monte Carlo sampling with regression
Uses flexible function families like XGBoost
Achieves state-of-the-art error reduction
🔎 Similar Papers
No similar papers found.
R
R. T. Witter
New York University
Y
Yurong Liu
New York University
Christopher Musco
Christopher Musco
Associate Professor, New York University
AlgorithmsTheory of ComputationMachine Learning