A Disease-Aware Dual-Stage Framework for Chest X-ray Report Generation

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current chest X-ray report generation methods suffer from weak disease-aware visual representations and insufficient vision–language alignment, leading to the neglect of critical pathological features and limited clinical accuracy. To address this, we propose a two-stage disease-aware framework. In Stage I, we introduce disease-aware semantic tokens and a Disease–Vision Attention Fusion (DVAF) module to explicitly model fine-grained associations between lesion regions and clinical concepts. In Stage II, a Dual-Modality Similarity Retrieval (DMSR) mechanism dynamically retrieves highly similar historical image–report pairs during generation to enhance semantic consistency. Our method integrates cross-attention, multi-label classification, and contrastive learning. Evaluated on CheXpert Plus, IU X-ray, and MIMIC-CXR, it achieves state-of-the-art performance, significantly improving pathology coverage, diagnostic accuracy, and linguistic fluency.

Technology Category

Application Category

📝 Abstract
Radiology report generation from chest X-rays is an important task in artificial intelligence with the potential to greatly reduce radiologists' workload and shorten patient wait times. Despite recent advances, existing approaches often lack sufficient disease-awareness in visual representations and adequate vision-language alignment to meet the specialized requirements of medical image analysis. As a result, these models usually overlook critical pathological features on chest X-rays and struggle to generate clinically accurate reports. To address these limitations, we propose a novel dual-stage disease-aware framework for chest X-ray report generation. In Stage~1, our model learns Disease-Aware Semantic Tokens (DASTs) corresponding to specific pathology categories through cross-attention mechanisms and multi-label classification, while simultaneously aligning vision and language representations via contrastive learning. In Stage~2, we introduce a Disease-Visual Attention Fusion (DVAF) module to integrate disease-aware representations with visual features, along with a Dual-Modal Similarity Retrieval (DMSR) mechanism that combines visual and disease-specific similarities to retrieve relevant exemplars, providing contextual guidance during report generation. Extensive experiments on benchmark datasets (i.e., CheXpert Plus, IU X-ray, and MIMIC-CXR) demonstrate that our disease-aware framework achieves state-of-the-art performance in chest X-ray report generation, with significant improvements in clinical accuracy and linguistic quality.
Problem

Research questions and friction points this paper is trying to address.

Existing models lack disease-awareness in visual representations for medical imaging
Current approaches have inadequate vision-language alignment for clinical accuracy
Models overlook critical pathological features in chest X-ray analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disease-Aware Semantic Tokens learn pathology categories via cross-attention
Disease-Visual Attention Fusion integrates disease-aware and visual features
Dual-Modal Similarity Retrieval combines visual and disease-specific similarities
🔎 Similar Papers
No similar papers found.
Puzhen Wu
Puzhen Wu
Cornell University
Medical AIBioinformatics
Hexin Dong
Hexin Dong
Postdoctoral Associate at Weill Cornell Medicine
Y
Yi Lin
Population Health Sciences, Weill Cornell Medicine, New York, NY , USA
Yihao Ding
Yihao Ding
The University of Western Australia
Multimodal LearningDocument UnderstandingInterdisciplinary AI
Y
Yifan Peng
Population Health Sciences, Weill Cornell Medicine, New York, NY , USA