Informative Post-Hoc Explanations Only Exist for Simple Functions

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of theoretical guarantees for local post-hoc explanation algorithms applied to complex models. We propose the first formal framework grounded in statistical learning theory, defining an “informative explanation” as one that significantly reduces the complexity of the admissible function space. Through rigorous mathematical analysis, we show that mainstream methods—including gradient-based, SHAP, counterfactual, and anchor explanations—generally fail to be informative in high-dimensional, nonlinear, or structurally complex function spaces (e.g., deep neural networks, ensemble trees). Informative explanations are recoverable only under specific assumptions—such as smoothness, sparsity, or low-dimensional structure—on the underlying model. Our findings challenge the implicit assumption that “any model is explainable,” establishing strict theoretical limits on algorithmic interpretability. Moreover, the framework provides verifiable, principled criteria for auditing and regulating explanations in high-stakes AI applications.

Technology Category

Application Category

📝 Abstract
Many researchers have suggested that local post-hoc explanation algorithms can be used to gain insights into the behavior of complex machine learning models. However, theoretical guarantees about such algorithms only exist for simple decision functions, and it is unclear whether and under which assumptions similar results might exist for complex models. In this paper, we introduce a general, learning-theory-based framework for what it means for an explanation to provide information about a decision function. We call an explanation informative if it serves to reduce the complexity of the space of plausible decision functions. With this approach, we show that many popular explanation algorithms are not informative when applied to complex decision functions, providing a rigorous mathematical rejection of the idea that it should be possible to explain any model. We then derive conditions under which different explanation algorithms become informative. These are often stronger than what one might expect. For example, gradient explanations and counterfactual explanations are non-informative with respect to the space of differentiable functions, and SHAP and anchor explanations are not informative with respect to the space of decision trees. Based on these results, we discuss how explanation algorithms can be modified to become informative. While the proposed analysis of explanation algorithms is mathematical, we argue that it holds strong implications for the practical applicability of these algorithms, particularly for auditing, regulation, and high-risk applications of AI.
Problem

Research questions and friction points this paper is trying to address.

Analyzing if post-hoc explanations work for complex models
Proving many explanation algorithms fail for complex functions
Deriving conditions for explanations to be informative
Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework defines informative explanations via complexity reduction
Proves popular explanations fail for complex functions
Derives conditions for informative explanations