PriorRG: Prior-Guided Contrastive Pre-training and Coarse-to-Fine Decoding for Chest X-ray Report Generation

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing chest X-ray report generation methods predominantly rely on single-image inputs, neglecting patient-specific clinical context and recent imaging history—critical priors for modeling disease progression and diagnostic intent. To address this limitation, we propose a patient-specific report generation framework that uniquely integrates clinical-context-guided contrastive pretraining with a prior-aware coarse-to-fine decoding mechanism. Our approach jointly models spatiotemporal image dynamics and clinical textual semantics through visual encoder hidden-state fusion, multimodal feature alignment, and staged training. Evaluated on the MIMIC-CXR and MIMIC-ABN benchmarks, our method achieves substantial improvements: +3.6% in BLEU-4, +3.8% in F1-score, and +5.9% in BLEU-1, demonstrating significantly enhanced clinical relevance and diagnostic logical consistency of generated reports.

Technology Category

Application Category

📝 Abstract
Chest X-ray report generation aims to reduce radiologists' workload by automatically producing high-quality preliminary reports. A critical yet underexplored aspect of this task is the effective use of patient-specific prior knowledge -- including clinical context (e.g., symptoms, medical history) and the most recent prior image -- which radiologists routinely rely on for diagnostic reasoning. Most existing methods generate reports from single images, neglecting this essential prior information and thus failing to capture diagnostic intent or disease progression. To bridge this gap, we propose PriorRG, a novel chest X-ray report generation framework that emulates real-world clinical workflows via a two-stage training pipeline. In Stage 1, we introduce a prior-guided contrastive pre-training scheme that leverages clinical context to guide spatiotemporal feature extraction, allowing the model to align more closely with the intrinsic spatiotemporal semantics in radiology reports. In Stage 2, we present a prior-aware coarse-to-fine decoding for report generation that progressively integrates patient-specific prior knowledge with the vision encoder's hidden states. This decoding allows the model to align with diagnostic focus and track disease progression, thereby enhancing the clinical accuracy and fluency of the generated reports. Extensive experiments on MIMIC-CXR and MIMIC-ABN datasets demonstrate that PriorRG outperforms state-of-the-art methods, achieving a 3.6% BLEU-4 and 3.8% F1 score improvement on MIMIC-CXR, and a 5.9% BLEU-1 gain on MIMIC-ABN. Code and checkpoints will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Utilizing patient-specific prior knowledge for chest X-ray reports
Generating reports considering diagnostic intent and disease progression
Improving clinical accuracy and fluency in automated report generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prior-guided contrastive pre-training for feature alignment
Coarse-to-fine decoding integrating prior knowledge
Two-stage training mimicking clinical workflows
🔎 Similar Papers
No similar papers found.
K
Kang Liu
School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710071, China
Zhuoqi Ma
Zhuoqi Ma
Xidian University
Computer vision
Z
Zikang Fang
School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710071, China
Y
Yunan Li
School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710071, China
K
Kun Xie
School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710071, China
Q
Qiguang Miao
School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi 710071, China