🤖 AI Summary
Addressing the challenge of balancing model interpretability and performance in high-stakes medical applications, this paper proposes an end-to-end interpretable multi-label classification framework: it first generates class-specific counterfactual attribution maps, then drives logistic regression classification using these maps—enabling simultaneous sample-level local and model-level global explanations. Key contributions include: (1) the first unified architecture that intrinsically supports both global and local interpretability; (2) a clinical-knowledge-guided regularization mechanism ensuring decisions are “right for the right reasons”; and (3) joint modeling of class-center representations and classifier weights to yield semantically meaningful, model-level attributions. The method achieves state-of-the-art performance across multiple medical imaging multi-label benchmarks. Clinical experts highly endorse the generated attribution maps, and quantitative evaluation shows significantly higher explanation consistency compared to mainstream post-hoc methods.
📝 Abstract
Interpretability is crucial for machine learning algorithms in high-stakes medical applications. However, high-performing neural networks typically cannot explain their predictions. Post-hoc explanation methods provide a way to understand neural networks but have been shown to suffer from conceptual problems. Moreover, current research largely focuses on providing local explanations for individual samples rather than global explanations for the model itself. In this paper, we propose Attri-Net, an inherently interpretable model for multi-label classification that provides both local and global explanations. Attri-Net first counterfactually generates class-specific attribution maps to highlight the disease evidence, then performs classification with logistic regression classifiers based solely on the attribution maps. Local explanations for each prediction can be obtained by interpreting the attribution maps weighted by the classifiers’ weights. Global explanation of whole model can be obtained by jointly considering learned average representations of the attribution maps for each class (called the class centers) and the weights of the linear classifiers. To ensure the model is “right for the right reason”, we introduce a mechanism to guide the model’s explanations to align with human knowledge. Our comprehensive evaluations show that Attri-Net can generate high-quality explanations consistent with clinical knowledge while not sacrificing classification performance. Our code is available at https://github.com/ss-sun/Attri-Net-V2