RadVLM: A Multitask Conversational Vision-Language Model for Radiology

📅 2025-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language models (VLMs) lack interactive diagnostic capabilities for chest X-ray (CXR) analysis. To address this gap, we propose the first lightweight, multi-task VLM tailored to radiological clinical needs—supporting report generation, abnormality classification, visual grounding, and multi-turn, multi-task dialogues. We introduce a radiology-specific instruction-tuning paradigm that jointly optimizes single-turn discriminative/generative tasks and multi-turn dialogue training. To enable this, we construct a large-scale CXR-instruction dataset comprising over one million instruction-response pairs, with both single-turn and multi-turn annotations. Experiments demonstrate state-of-the-art performance on dialogue understanding and visual grounding, while maintaining top-tier accuracy across other radiological tasks. Ablation studies confirm that joint multi-task training significantly enhances generalization and robustness in few-shot settings.

Technology Category

Application Category

📝 Abstract
The widespread use of chest X-rays (CXRs), coupled with a shortage of radiologists, has driven growing interest in automated CXR analysis and AI-assisted reporting. While existing vision-language models (VLMs) show promise in specific tasks such as report generation or abnormality detection, they often lack support for interactive diagnostic capabilities. In this work we present RadVLM, a compact, multitask conversational foundation model designed for CXR interpretation. To this end, we curate a large-scale instruction dataset comprising over 1 million image-instruction pairs containing both single-turn tasks -- such as report generation, abnormality classification, and visual grounding -- and multi-turn, multi-task conversational interactions. After fine-tuning RadVLM on this instruction dataset, we evaluate it across different tasks along with re-implemented baseline VLMs. Our results show that RadVLM achieves state-of-the-art performance in conversational capabilities and visual grounding while remaining competitive in other radiology tasks. Ablation studies further highlight the benefit of joint training across multiple tasks, particularly for scenarios with limited annotated data. Together, these findings highlight the potential of RadVLM as a clinically relevant AI assistant, providing structured CXR interpretation and conversational capabilities to support more effective and accessible diagnostic workflows.
Problem

Research questions and friction points this paper is trying to address.

Automated CXR analysis and reporting
Interactive diagnostic capabilities in VLMs
Multitask conversational model for radiology
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multitask conversational foundation model
Large-scale instruction dataset curation
State-of-the-art conversational capabilities
🔎 Similar Papers
No similar papers found.
N
Nicolas Deperrois
Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
Hidetoshi Matsuo
Hidetoshi Matsuo
Kobe University
医用画像 深層学習
S
Samuel Ruip'erez-Campillo
Department of Computer Science, ETH Zurich, Zurich, Switzerland
Moritz Vandenhirtz
Moritz Vandenhirtz
PhD student, ETH Zurich
Generative ModelingInterpretable Machine LearningComputer VisionMedical Data Science
Sonia Laguna
Sonia Laguna
PhD student, ETH Zürich
Machine LearningGenerative ModelsInterpretability
Alain Ryser
Alain Ryser
PhD Student, ETH
Computer ScienceMedical Data ScienceMachine Learning
Koji Fujimoto
Koji Fujimoto
Department of Advanced Imaging in Medical Magnetic Resonance, Kyoto University, Kyoto, Japan
Mizuho Nishio
Mizuho Nishio
Kyoto University
Medical image analysisMachine learningDeep LearningRadiologyComputer Vision
Thomas M. Sutter
Thomas M. Sutter
Postdoc, ETH Zurich
Generative ModelsMultimodal MLProbabilistic MLRepresentation LearningML for Healthcare
J
Julia E. Vogt
Department of Computer Science, ETH Zurich, Zurich, Switzerland
J
Jonas Kluckert
Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland; Diagnostic and Interventional Radiology, University Hospital Zurich, Zurich, Switzerland
Thomas Frauenfelder
Thomas Frauenfelder
Diagnostic and Interventional Radiology, University Hospital Zurich, Zurich, Switzerland
C
Christian Bluthgen
Diagnostic and Interventional Radiology, University Hospital Zurich, Zurich, Switzerland
F
F. Nooralahzadeh
Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
Michael Krauthammer
Michael Krauthammer
University of Zurich
Biomedical Informatics