When Fine-Tuning Changes the Evidence: Architecture-Dependent Semantic Drift in Chest X-Ray Explanations

πŸ“… 2026-04-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the trade-off between accuracy and interpretability in chest X-ray multi-class classification, where fine-tuning improves performance but may induce semantic drift in the visual evidence underpinning model explanations, thereby undermining clinical trust. The authors propose a two-stage training protocol and systematically compare attribution maps generated under transfer learning versus full fine-tuning across DenseNet201, ResNet50V2, and InceptionV3. They demonstrate for the first time that explanation stability is jointly determined by model architecture, optimization phase, and attribution methodβ€”with stability rankings even reversing across different attribution techniques. Using LayerCAM and GradCAM++ alongside no-reference metrics such as IoU to assess spatial consistency, they find that coarse-grained anatomical localization remains stable, whereas fine-grained evidential structures are highly architecture-dependent, revealing that high accuracy does not guarantee reliable interpretability.
πŸ“ Abstract
Transfer learning followed by fine-tuning is widely adopted in medical image classification due to consistent gains in diagnostic performance. However, in multi-class settings with overlapping visual features, improvements in accuracy do not guarantee stability of the visual evidence used to support predictions. We define semantic drift as systematic changes in the attribution structure supporting a model's predictions between transfer learning and full fine-tuning, reflecting potential shifts in underlying visual reasoning despite stable classification performance. Using a five-class chest X-ray task, we evaluate DenseNet201, ResNet50V2, and InceptionV3 under a two-stage training protocol and quantify drift with reference-free metrics capturing spatial localization and structural consistency of attribution maps. Across architectures, coarse anatomical localization remains stable, while overlap IoU reveals pronounced architecture-dependent reorganization of evidential structure. Beyond single-method analysis, stability rankings can reverse across LayerCAM and GradCAM++ under converged predictive performance, establishing explanation stability as an interaction between architecture, optimization phase, and attribution objective.
Problem

Research questions and friction points this paper is trying to address.

semantic drift
fine-tuning
medical image classification
attribution stability
chest X-ray
Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic drift
attribution stability
transfer learning
medical image explanation
architecture-dependent
πŸ”Ž Similar Papers
No similar papers found.
Kabilan Elangovan
Kabilan Elangovan
AI Scientist, SingHealth
Artificial IntelligenceDeep LearningDigital HealthcareGenerative AI
D
Daniel Ting
Singapore Health Services, Singapore; Singapore Eye Research Institute, Singapore