MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis

📅 2024-03-22
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF

career value

159K/year
🤖 AI Summary
To address the challenge of effectively fusing chest X-ray images with incomplete electronic health records (EHRs) in clinical diagnosis, this paper proposes the first end-to-end multimodal large language model (MLLM) framework for thoracic imaging–EHR integration, incorporating dynamic few-shot prompting, visual grounding, and cross-modal alignment. We introduce a novel dynamic few-shot data refinement mechanism that enables patient-context adaptation, and—uniquely in this domain—integrate visual grounding into joint chest X-ray–EHR reasoning to substantially mitigate hallucination and improve lesion localization accuracy. Evaluated on our newly constructed MedPromptX-VQA benchmark, the framework achieves state-of-the-art performance with an 11% absolute improvement in F1-score over prior baselines. Both code and dataset are publicly released to foster reproducible research.

Technology Category

Application Category

📝 Abstract
Chest X-ray images are commonly used for predicting acute and chronic cardiopulmonary conditions, but efforts to integrate them with structured clinical data face challenges due to incomplete electronic health records (EHR). This paper introduces MedPromptX, the first model to integrate multimodal large language models (MLLMs), few-shot prompting (FP) and visual grounding (VG) to combine imagery with EHR data for chest X-ray diagnosis. A pre-trained MLLM is utilized to complement the missing EHR information, providing a comprehensive understanding of patients' medical history. Additionally, FP reduces the necessity for extensive training of MLLMs while effectively tackling the issue of hallucination. Nevertheless, the process of determining the optimal number of few-shot examples and selecting high-quality candidates can be burdensome, yet it profoundly influences model performance. Hence, we propose a new technique that dynamically refines few-shot data for real-time adjustment to new patient scenarios. Moreover, VG aids in focusing the model's attention on relevant regions of interest in X-ray images, enhancing the identification of abnormalities. We release MedPromptX-VQA, a new in-context visual question answering dataset encompassing interleaved image and EHR data derived from MIMIC-IV and MIMIC-CXR databases. Results demonstrate the SOTA performance of MedPromptX, achieving an 11% improvement in F1-score compared to the baselines. Code and data are available at https://github.com/BioMedIA-MBZUAI/MedPromptX
Problem

Research questions and friction points this paper is trying to address.

Chest X-ray
Medical Records Incompleteness
Image-Text Integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

MedPromptX
MLLM_and_FP
VG_technique
🔎 Similar Papers
No similar papers found.