🤖 AI Summary
This paper exposes a fundamental flaw in generalized additive models (GAMs), including neural additive models, when claimed to be “interpretable” and “suitable for safety-critical applications”: they suffer from triple non-identifiability—of parameters, component functions, and model structure—leading to non-unique local and global attributions and severely undermining interpretability’s reliability.
Method: We systematically classify and rigorously prove how each type of non-identifiability fundamentally compromises interpretability, integrating statistical identifiability theory, functional space analysis, counterexample construction, and sensitivity analysis; we further propose an interpretability evaluation framework grounded in identifiability.
Contribution/Results: We establish “identifiability as a prerequisite for interpretability” as a formal theoretical principle. Our work provides critical theoretical boundaries and practical warnings for trustworthy AI model selection, highlighting that unaddressed non-identifiability invalidates attribution-based explanations—even in ostensibly transparent models.
📝 Abstract
We review generalized additive models as a type of ``transparent'' model that has recently seen renewed interest in the deep learning community as neural additive models. We highlight multiple types of nonidentifiability in this model class and discuss challenges in interpretability, arguing for restraint when claiming ``interpretability'' or ``suitability for safety-critical applications'' of such models.