Joint enhancement of automatic chest X-ray diagnosis and radiological gaze prediction with multi-stage cooperative learning

📅 2024-03-25

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study addresses the dual challenge of improving both diagnostic accuracy and interpretability in automated chest X-ray analysis by explicitly modeling radiologists’ visual attention mechanisms. We propose a dual-encoder multitask U-Net architecture that jointly optimizes disease classification and eye-tracking fixation map prediction. The encoder employs two complementary backbones—DenseNet-201 and SE-ResNet—to capture hierarchical and channel-wise discriminative features, with multi-scale feature fusion. To mitigate asynchronous convergence in multitask learning, we introduce contrastive pretraining and a coordinated multi-stage learning strategy. Crucially, this is the first work to explicitly embed clinical eye-tracking priors into a diagnostic model, thereby unifying high discriminative power with intrinsic interpretability. On standard benchmarks, our method achieves an AUC of 0.93 for disease classification and a Pearson correlation coefficient of 0.58 for fixation map prediction—both significantly surpassing state-of-the-art single-task and multitask baselines.

Technology Category

Application Category

📝 Abstract

Purpose: As visual inspection is an inherent process during radiological screening, the associated eye gaze data can provide valuable insights into relevant clinical decisions. As deep learning has become the state-of-the-art for computer-assisted diagnosis, integrating human behavior, such as eye gaze data, into these systems is instrumental to help align machine predictions with clinical diagnostic criteria, thus enhancing the quality of automatic radiological diagnosis. Methods: We propose a novel deep learning framework for joint disease diagnosis and prediction of corresponding clinical visual attention maps for chest X-ray scans. Specifically, we introduce a new dual-encoder multi-task UNet, which leverages both a DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based encoder to extract diverse features for visual attention map prediction, and a multi-scale feature-fusion classifier to perform disease classification. To tackle the issue of asynchronous training schedules of individual tasks in multi-task learning, we proposed a multi-stage cooperative learning strategy, with contrastive learning for feature encoder pretraining to boost performance. Results: Our proposed method is shown to significantly outperform existing techniques for chest X-ray diagnosis (AUC=0.93) and the quality of visual attention map prediction (Correlation coefficient=0.58). Conclusion: Benefiting from the proposed multi-task multi-stage cooperative learning, our technique demonstrates the benefit of integrating clinicians' eye gaze into clinical AI systems to boost performance and potentially explainability.

Problem

Research questions and friction points this paper is trying to address.

Enhance chest X-ray diagnosis

Predict clinical visual attention

Integrate eye gaze data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task UNet for joint diagnosis

Multi-stage cooperative learning strategy

Contrastive learning for encoder pretraining

🔎 Similar Papers

No similar papers found.