MCL: Multi-view Enhanced Contrastive Learning for Chest X-ray Report Generation

📅 2024-11-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Existing methods predominantly generate radiology reports from single-view X-ray images, leading to diagnostic bias due to insufficient anatomical information. To address this, we propose the first chest X-ray report generation framework leveraging dual orthogonal views (PA and LL). Our approach introduces two key innovations: (1) multi-view enhanced contrastive learning to achieve fine-grained cross-modal alignment between images and text; and (2) a symptom-missing-aware semantic bridging module that mitigates embedding shifts caused by patient-specific knowledge gaps. Evaluated on four standard benchmarks—including MIMIC-CXR—our method achieves new state-of-the-art performance: +5.0% RadGraph F1, +8.2% CheXbert F1, and +7.3% / +3.1% improvements in BLEU-1 and BLEU-4, respectively. These gains demonstrate substantial enhancements in clinical relevance and linguistic coherence.

Technology Category

Application Category

📝 Abstract

Radiology reports are crucial for planning treatment strategies and enhancing doctor-patient communication, yet manually writing these reports is burdensome for radiologists. While automatic report generation offers a solution, existing methods often rely on single-view radiographs, limiting diagnostic accuracy. To address this problem, we propose MCL, a Multi-view enhanced Contrastive Learning method for chest X-ray report generation. Specifically, we first introduce multi-view enhanced contrastive learning for visual representation by maximizing agreements between multi-view radiographs and their corresponding report. Subsequently, to fully exploit patient-specific indications (e.g., patient's symptoms) for report generation, we add a transitional ``bridge"for missing indications to reduce embedding space discrepancies caused by their presence or absence. Additionally, we construct Multi-view CXR and Two-view CXR datasets from public sources to support research on multi-view report generation. Our proposed MCL surpasses recent state-of-the-art methods across multiple datasets, achieving a 5.0% F1 RadGraph improvement on MIMIC-CXR, a 7.3% BLEU-1 improvement on MIMIC-ABN, a 3.1% BLEU-4 improvement on Multi-view CXR, and an 8.2% F1 CheXbert improvement on Two-view CXR.

Problem

Research questions and friction points this paper is trying to address.

Automates radiology report generation to reduce radiologist workload.

Improves diagnostic accuracy using multi-view X-ray images.

Integrates patient-specific data for more accurate and coherent reports.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view contrastive learning enhances visual representation.

Patient-specific knowledge improves report accuracy and coherence.

EVOKE framework outperforms state-of-the-art methods significantly.

🔎 Similar Papers

No similar papers found.