🤖 AI Summary
Existing methods predominantly generate radiology reports from single-view X-ray images, leading to diagnostic bias due to insufficient anatomical information. To address this, we propose the first chest X-ray report generation framework leveraging dual orthogonal views (PA and LL). Our approach introduces two key innovations: (1) multi-view enhanced contrastive learning to achieve fine-grained cross-modal alignment between images and text; and (2) a symptom-missing-aware semantic bridging module that mitigates embedding shifts caused by patient-specific knowledge gaps. Evaluated on four standard benchmarks—including MIMIC-CXR—our method achieves new state-of-the-art performance: +5.0% RadGraph F1, +8.2% CheXbert F1, and +7.3% / +3.1% improvements in BLEU-1 and BLEU-4, respectively. These gains demonstrate substantial enhancements in clinical relevance and linguistic coherence.
📝 Abstract
Radiology reports are crucial for planning treatment strategies and enhancing doctor-patient communication, yet manually writing these reports is burdensome for radiologists. While automatic report generation offers a solution, existing methods often rely on single-view radiographs, limiting diagnostic accuracy. To address this problem, we propose MCL, a Multi-view enhanced Contrastive Learning method for chest X-ray report generation. Specifically, we first introduce multi-view enhanced contrastive learning for visual representation by maximizing agreements between multi-view radiographs and their corresponding report. Subsequently, to fully exploit patient-specific indications (e.g., patient's symptoms) for report generation, we add a transitional ``bridge"for missing indications to reduce embedding space discrepancies caused by their presence or absence. Additionally, we construct Multi-view CXR and Two-view CXR datasets from public sources to support research on multi-view report generation. Our proposed MCL surpasses recent state-of-the-art methods across multiple datasets, achieving a 5.0% F1 RadGraph improvement on MIMIC-CXR, a 7.3% BLEU-1 improvement on MIMIC-ABN, a 3.1% BLEU-4 improvement on Multi-view CXR, and an 8.2% F1 CheXbert improvement on Two-view CXR.