Provably Adaptive Linear Approximation for the Shapley Value and Beyond

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the exponential utility query complexity inherent in computing Shapley values and other semivalues. Under a Θ(n) space constraint, it introduces Adalina, the first adaptive randomized algorithm that operates in linear time and space. Leveraging vector concentration inequalities, Adalina achieves a high-probability ℓ² error guarantee for all semivalues using O((n/ε²)·log(1/δ)) queries and explicitly optimizes mean squared error (MSE). Theoretical analysis demonstrates that Adalina unifies and improves upon existing approaches, clarifying the conditions under which pairwise sampling is advantageous. Empirical evaluations confirm that Adalina consistently attains significantly lower MSE than baseline methods across diverse settings while substantially reducing query complexity.

Technology Category

Application Category

📝 Abstract

The Shapley value, and its broader family of semi-values, has received much attention in various attribution problems. A fundamental and long-standing challenge is their efficient approximation, since exact computation generally requires an exponential number of utility queries in the number of players $n$. To meet the challenges of large-scale applications, we explore the limits of efficiently approximating semi-values under a $Θ(n)$ space constraint. Building upon a vector concentration inequality, we establish a theoretical framework that enables sharper query complexities for existing unbiased randomized algorithms. Within this framework, we systematically develop a linear-space algorithm that requires $O(\frac{n}{ε^{2}}\log\frac{1}δ)$ utility queries to ensure $P(\|\hat{\boldsymbolφ}-\boldsymbolφ\|_{2}\geqε)\leq δ$ for all commonly used semi-values. In particular, our framework naturally bridges OFA, unbiased kernelSHAP, SHAP-IQ and the regression-adjusted approach, and definitively characterizes when paired sampling is beneficial. Moreover, our algorithm allows explicit minimization of the mean square error for each specific utility function. Accordingly, we introduce the first adaptive, linear-time, linear-space randomized algorithm, Adalina, that theoretically achieves improved mean square error. All of our theoretical findings are experimentally validated.

Problem

Research questions and friction points this paper is trying to address.

Shapley value

semi-values

efficient approximation

linear space

utility queries

Innovation

Methods, ideas, or system contributions that make the work stand out.

semi-values

linear-space algorithm

adaptive approximation