🤖 AI Summary
In multi-label medical image classification, class-specific features are often corrupted by irrelevant information, leading to weak causal interpretability. To address this, we propose an information-bottleneck-guided causal attention method. Our approach constructs an explicit structural causal model to disentangle causal, spurious, and noisy factors; designs a Gaussian mixture-based class-specific spatial attention mechanism; and integrates contrastive causal intervention to achieve causal feature disentanglement and suppression of non-causal information. By jointly optimizing information bottleneck constraints, multi-head attention alignment, and contrastive learning, the method significantly enhances feature discriminability and interpretability. On the Endo and MuReD benchmarks, it achieves up to a 5.02% improvement in mean average precision (mAP), with critical metrics—including classification recall (CR) and organ recall (OR)—reaching state-of-the-art performance. Empirical results validate its dual advantages: improved diagnostic accuracy and reliable etiological attribution.
📝 Abstract
Multi-label classification (MLC) of medical images aims to identify multiple diseases and holds significant clinical potential. A critical step is to learn class-specific features for accurate diagnosis and improved interpretability effectively. However, current works focus primarily on causal attention to learn class-specific features, yet they struggle to interpret the true cause due to the inadvertent attention to class-irrelevant features. To address this challenge, we propose a new structural causal model (SCM) that treats class-specific attention as a mixture of causal, spurious, and noisy factors, and a novel Information Bottleneck-based Causal Attention (IBCA) that is capable of learning the discriminative class-specific attention for MLC of medical images. Specifically, we propose learning Gaussian mixture multi-label spatial attention to filter out class-irrelevant information and capture each class-specific attention pattern. Then a contrastive enhancement-based causal intervention is proposed to gradually mitigate the spurious attention and reduce noise information by aligning multi-head attention with the Gaussian mixture multi-label spatial. Quantitative and ablation results on Endo and MuReD show that IBCA outperforms all methods. Compared to the second-best results for each metric, IBCA achieves improvements of 6.35% in CR, 7.72% in OR, and 5.02% in mAP for MuReD, 1.47% in CR, and 1.65% in CF1, and 1.42% in mAP for Endo.