🤖 AI Summary
This work addresses the challenge of attributing high-order feature interactions—such as second- and higher-order joint contributions—in machine learning models. Methodologically, it generalizes Integrated Gradients to multidimensional tensor spaces, integrating statistical sufficiency conditions with chain complex structures from topological signal processing to rigorously define and derive high-order attribution operators satisfying uniqueness, symmetry, and conservation properties. Theoretically, it establishes, for the first time, a unified and provably sound mathematical foundation for high-order attribution, uncovering deep connections among explainable AI, statistical interaction effects, and topological data analysis. Empirically, the proposed framework demonstrates significant improvements in identifying nonlinear interactions and maintaining explanation consistency across multiple benchmark tasks.
📝 Abstract
Feature attributions are post-training analysis methods that assess how various input features of a machine learning model contribute to an output prediction. Their interpretation is straightforward when features act independently, but becomes less direct when the predictive model involves interactions such as multiplicative relationships or joint feature contributions. In this work, we propose a general theory of higher-order feature attribution, which we develop on the foundation of Integrated Gradients (IG). This work extends existing frameworks in the literature on explainable AI. When using IG as the method of feature attribution, we discover natural connections to statistics and topological signal processing. We provide several theoretical results that establish the theory, and we validate our theory on a few examples.