LUMEN: Longitudinal Multi-Modal Radiology Model for Prognosis and Diagnosis

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges radiologists face in manually comparing longitudinal medical images—such as chest X-rays—for time-consuming analysis and difficult prognosis assessment. To this end, the authors propose LUMEN, a novel framework that introduces the first instruction-following dataset specifically designed for longitudinal imaging. By leveraging multi-image joint modeling and multi-task instruction tuning, LUMEN enhances large vision-language models to perform both diagnostic reasoning and prognosis prediction. The approach integrates multimodal alignment, longitudinal image modeling, and visual question answering techniques, achieving significant performance gains over existing baselines on the MIMIC-CXR and Medical-Diff-VQA benchmarks. Notably, it demonstrates strong clinical potential in prognosis-focused visual question answering tasks.

Technology Category

Application Category

📝 Abstract
Large vision-language models (VLMs) have evolved from general-purpose applications to specialized use cases such as in the clinical domain, demonstrating potential for decision support in radiology. One promising application is assisting radiologists in decision-making by the analysis of radiology imaging data such as chest X-rays (CXR) via a visual and natural language question-answering (VQA) interface. When longitudinal imaging is available, radiologists analyze temporal changes, which are essential for accurate diagnosis and prognosis. The manual longitudinal analysis is a time-consuming process, motivating the development of a training framework that can provide prognostic capabilities. We introduce a novel training framework LUMEN, that is optimized for longitudinal CXR interpretation, leveraging multi-image and multi-task instruction fine-tuning to enhance prognostic and diagnostic performance. We conduct experiments on the publicly available MIMIC-CXR and its associated Medical-Diff-VQA datasets. We further formulate and construct a novel instruction-following dataset incorporating longitudinal studies, enabling the development of a prognostic VQA task. Our method demonstrates significant improvements over baseline models in diagnostic VQA tasks, and more importantly, shows promising potential for prognostic capabilities. These results underscore the value of well-designed, instruction-tuned VLMs in enabling more accurate and clinically meaningful radiological interpretation of longitudinal radiological imaging data.
Problem

Research questions and friction points this paper is trying to address.

longitudinal imaging
prognosis
diagnosis
radiology
vision-language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

longitudinal imaging
vision-language model
instruction fine-tuning
prognostic VQA
multi-modal radiology
🔎 Similar Papers
No similar papers found.
Z
Zhifan Jiang
Sheikh Zayed Institute for Pediatric Surgical Innovation, Children’s National Hospital, Washington DC, USA
D
Dong Yang
Nvidia Corporation, Santa Clara, CA, USA
Vishwesh Nath
Vishwesh Nath
NVIDIA
Medical Image AnalysisImage ProcessingMachine Learning
A
Abhijeet Parida
Sheikh Zayed Institute for Pediatric Surgical Innovation, Children’s National Hospital, Washington DC, USA; ETSI Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain
N
Nishad P. Kulkarni
Sheikh Zayed Institute for Pediatric Surgical Innovation, Children’s National Hospital, Washington DC, USA
Ziyue Xu
Ziyue Xu
NVIDIA
Medical Image AnalysisComputer VisionFederated Learning
Daguang Xu
Daguang Xu
Senior Research Manager at NVIDIA
Deep LearningMachine LearningMedical Image AnalysisCompressive SensingSparse coding
Syed Muhammad Anwar
Syed Muhammad Anwar
Childrens National Hospital/George Washington University
Biomedical Signal processingmedical image analysisgraph learningself-supervised learning
Holger R. Roth
Holger R. Roth
NVIDIA
Medical image processing - Computer-aided DetectionCT Colonography - Registration
M
Marius George Linguraru
Sheikh Zayed Institute for Pediatric Surgical Innovation, Children’s National Hospital, Washington DC, USA; School of Medicine and Health Sciences, George Washington University, Washington DC, USA