🤖 AI Summary
This paper addresses the weak causality and poor stability of existing global explanation methods for deep neural networks (DNNs). To this end, we introduce—*for the first time*—the notion of *Intrinsic Causal Contribution (ICC)*, modeling DNNs as structural causal models (SCMs) and establishing an identifiable generative posterior explanation framework. Theoretically, we derive a rigorous equivalence between ICC and Sobol’ sensitivity indices, thereby transcending conventional direct/indirect effect paradigms and enabling unbiased quantification of the intrinsic causal effects of input features. Methodologically, we integrate causal intervention estimation with generative posterior inference. Extensive experiments on synthetic and real-world datasets demonstrate that ICC yields more intuitive and significantly more robust global attributions than state-of-the-art approaches—including SHAP and Integrated Gradients—thereby laying a novel foundation for causal interpretability in trustworthy AI.
📝 Abstract
Quantifying the causal influence of input features within neural networks has become a topic of increasing interest. Existing approaches typically assess direct, indirect, and total causal effects. This work treats NNs as structural causal models (SCMs) and extends our focus to include intrinsic causal contributions (ICC). We propose an identifiable generative post-hoc framework for quantifying ICC. We also draw a relationship between ICC and Sobol' indices. Our experiments on synthetic and real-world datasets demonstrate that ICC generates more intuitive and reliable explanations compared to existing global explanation techniques.