Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing chest X-ray report generation methods predominantly rely on single-view static images, limiting their capacity to capture spatial disease characteristics and longitudinal progression patterns. To address this, we propose a multi-view longitudinal joint modeling framework. First, we introduce multi-view longitudinal contrastive learning, integrating concurrent multi-angle radiographs with temporal sequences across multiple time points. Second, we design a report-guided vision–language joint pretraining mechanism to enhance cross-modal alignment. Third, we incorporate missingness-aware tokenization encoding to flexibly accommodate incomplete prior knowledge. Evaluated on MIMIC-CXR, MIMIC-ABN, and Two-view CXR benchmarks, our method achieves improvements of +2.3% in BLEU-4, +5.5% and +2.7% in F1-score, and significantly outperforms state-of-the-art approaches in RadGraph-based structured evaluation.

Technology Category

Application Category

📝 Abstract
Automated radiology report generation offers an effective solution to alleviate radiologists' workload. However, most existing methods focus primarily on single or fixed-view images to model current disease conditions, which limits diagnostic accuracy and overlooks disease progression. Although some approaches utilize longitudinal data to track disease progression, they still rely on single images to analyze current visits. To address these issues, we propose enhanced contrastive learning with Multi-view Longitudinal data to facilitate chest X-ray Report Generation, named MLRG. Specifically, we introduce a multi-view longitudinal contrastive learning method that integrates spatial information from current multi-view images and temporal information from longitudinal data. This method also utilizes the inherent spatiotemporal information of radiology reports to supervise the pre-training of visual and textual representations. Subsequently, we present a tokenized absence encoding technique to flexibly handle missing patient-specific prior knowledge, allowing the model to produce more accurate radiology reports based on available prior knowledge. Extensive experiments on MIMIC-CXR, MIMIC-ABN, and Two-view CXR datasets demonstrate that our MLRG outperforms recent state-of-the-art methods, achieving a 2.3% BLEU-4 improvement on MIMIC-CXR, a 5.5% F1 score improvement on MIMIC-ABN, and a 2.7% F1 RadGraph improvement on Two-view CXR.
Problem

Research questions and friction points this paper is trying to address.

Enhances chest X-ray report generation accuracy
Integrates multi-view and longitudinal data analysis
Handles missing prior knowledge in diagnostics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-view longitudinal contrastive learning
Tokenized absence encoding technique
Inherent spatiotemporal information integration
🔎 Similar Papers
No similar papers found.
K
Kang Liu
School of Computer Science and Technology, Xidian University, Xi’an, China; Xi’an Key Laboratory of Big Data and Intelligent Vision, Xi’an, China; Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China
Zhuoqi Ma
Zhuoqi Ma
Xidian University
Computer vision
X
Xiaolu Kang
School of Computer Science and Technology, Xidian University, Xi’an, China; Xi’an Key Laboratory of Big Data and Intelligent Vision, Xi’an, China; Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China
Y
Yunan Li
School of Computer Science and Technology, Xidian University, Xi’an, China; Xi’an Key Laboratory of Big Data and Intelligent Vision, Xi’an, China; Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China
K
Kun Xie
School of Computer Science and Technology, Xidian University, Xi’an, China; Xi’an Key Laboratory of Big Data and Intelligent Vision, Xi’an, China
Zhicheng Jiao
Zhicheng Jiao
Brown University Health, Warren Alpert Medical School of Brown University
Medical image analysisHealth informatics
Q
Qiguang Miao
School of Computer Science and Technology, Xidian University, Xi’an, China; Xi’an Key Laboratory of Big Data and Intelligent Vision, Xi’an, China; Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an, China