PolySHAP: Extending KernelSHAP with Interaction-Informed Polynomial Regression

📅 2026-01-26

📈 Citations: 1

✨ Influential: 1

career value

195K/year

🤖 AI Summary

This work addresses the computational intractability of exact Shapley value estimation, which requires exponentially many model evaluations and thus does not scale to high-dimensional feature spaces. To overcome this limitation, the authors propose PolySHAP, a novel approximation method that replaces the linear assumption in KernelSHAP with higher-order polynomial regression to better capture nonlinear feature interactions. Theoretical analysis reveals that second-order PolySHAP is equivalent to pairwise sampling, thereby providing rigorous justification for this previously heuristic approach. Extensive experiments across multiple benchmark datasets demonstrate that PolySHAP achieves significantly improved accuracy and consistency in Shapley value estimation compared to existing methods.

Technology Category

Application Category

📝 Abstract

Shapley values have emerged as a central game-theoretic tool in explainable AI (XAI). However, computing Shapley values exactly requires $2^d$ game evaluations for a model with $d$ features. Lundberg and Lee's KernelSHAP algorithm has emerged as a leading method for avoiding this exponential cost. KernelSHAP approximates Shapley values by approximating the game as a linear function, which is fit using a small number of game evaluations for random feature subsets. In this work, we extend KernelSHAP by approximating the game via higher degree polynomials, which capture non-linear interactions between features. Our resulting PolySHAP method yields empirically better Shapley value estimates for various benchmark datasets, and we prove that these estimates are consistent. Moreover, we connect our approach to paired sampling (antithetic sampling), a ubiquitous modification to KernelSHAP that improves empirical accuracy. We prove that paired sampling outputs exactly the same Shapley value approximations as second-order PolySHAP, without ever fitting a degree 2 polynomial. To the best of our knowledge, this finding provides the first strong theoretical justification for the excellent practical performance of the paired sampling heuristic.

Problem

Research questions and friction points this paper is trying to address.

Shapley values

explainable AI

feature interactions

KernelSHAP

computational complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

PolySHAP

Shapley values

polynomial regression