Provably Explaining Neural Additive Models

📅 2026-02-19

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the intractability of generating minimal and provably correct feature explanations for general neural networks, which typically requires an exponential number of verification queries. Focusing on Neural Additive Models (NAMs), the paper introduces the first specialized algorithm that exploits their structural properties to efficiently produce provably optimal minimal explanations. By integrating parallelized preprocessing with a logarithmic-complexity verification query mechanism, the method overcomes the computational bottlenecks inherent in conventional formal explanation approaches. It achieves strict interpretability guarantees while significantly outperforming existing approximation- or sampling-based techniques. Experimental results demonstrate that the proposed approach yields more concise explanations, faster computation, and stronger theoretical assurances than prior methods.

Technology Category

Application Category

📝 Abstract

Despite significant progress in post-hoc explanation methods for neural networks, many remain heuristic and lack provable guarantees. A key approach for obtaining explanations with provable guarantees is by identifying a cardinally-minimal subset of input features which by itself is provably sufficient to determine the prediction. However, for standard neural networks, this task is often computationally infeasible, as it demands a worst-case exponential number of verification queries in the number of input features, each of which is NP-hard. In this work, we show that for Neural Additive Models (NAMs), a recent and more interpretable neural network family, we can efficiently generate explanations with such guarantees. We present a new model-specific algorithm for NAMs that generates provably cardinally-minimal explanations using only a logarithmic number of verification queries in the number of input features, after a parallelized preprocessing step with logarithmic runtime in the required precision is applied to each small univariate NAM component. Our algorithm not only makes the task of obtaining cardinally-minimal explanations feasible, but even outperforms existing algorithms designed to find the relaxed variant of subset-minimal explanations - which may be larger and less informative but easier to compute - despite our algorithm solving a much more difficult task. Our experiments demonstrate that, compared to previous algorithms, our approach provides provably smaller explanations than existing works and substantially reduces the computation time. Moreover, we show that our generated provable explanations offer benefits that are unattainable by standard sampling-based techniques typically used to interpret NAMs.

Problem

Research questions and friction points this paper is trying to address.

provable explanations

cardinally-minimal explanations

Neural Additive Models

feature attribution

model interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Additive Models

provable explanations

cardinally-minimal explanations