A Generalist Learner for Multifaceted Medical Image Interpretation

📅 2024-05-13
🏛️ arXiv.org
📈 Citations: 29
Influential: 3
📄 PDF
🤖 AI Summary
Weak generalization and poor task adaptability of existing medical AI systems hinder clinical deployment. To address this, we propose MedVersa—the first universal generative learner for multimodal medical image understanding—introducing a novel architecture that integrates a large language model as a learnable orchestrator to unify visual–linguistic joint supervision, dynamic task specification, and cross-modal input/output handling. We construct MedInterp, the largest medical image understanding dataset to date (13 million samples, 11 tasks, 3 modalities), enabling end-to-end multi-task joint training and zero-shot generalization. MedVersa achieves state-of-the-art performance across nine medical imaging benchmarks, with several metrics surpassing specialized models by over 10%. This work provides the first empirical validation of the feasibility and clinical applicability of a universal, multimodal, generative medical AI system.

Technology Category

Application Category

📝 Abstract
Current medical artificial intelligence systems are often limited to narrow applications, hindering their widespread adoption in clinical practice. To address this limitation, we propose MedVersa, a generalist learner that enables flexible learning and tasking for medical image interpretation. By leveraging a large language model as a learnable orchestrator, MedVersa can learn from both visual and linguistic supervision, support multimodal inputs, and perform real-time task specification. This versatility allows MedVersa to adapt to various clinical scenarios and perform multifaceted medical image analysis. We introduce MedInterp, the largest multimodal dataset to date for medical image interpretation, consisting of over 13 million annotated instances spanning 11 tasks across 3 modalities, to support the development of MedVersa. Our experiments demonstrate that MedVersa achieves state-of-the-art performance in 9 tasks, sometimes outperforming specialist counterparts by over 10%. MedVersa is the first to showcase the viability of multimodal generative medical AI in implementing multimodal outputs, inputs, and dynamic task specification, highlighting its potential as a multifunctional system for comprehensive medical image analysis. This generalist approach to medical image interpretation paves the way for more adaptable and efficient AI-assisted clinical decision-making.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations of narrow medical AI applications
Achieving competitive performance across diverse imaging scenarios
Reducing report time and discrepancies in clinical settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalist foundation model for medical imaging
Multimodal inputs and outputs learning
Competitive performance with specialized solutions
🔎 Similar Papers
No similar papers found.
Hong-Yu Zhou
Hong-Yu Zhou
Assistant Professor of Biomedical Engineering, Tsinghua University. Past: Harvard Medical School.
AI for HealthcareAI for MedicineBiomedical AI
S
Subathra Adithan
Jawaharlal Institute of Postgraduate Medical Education and Research, Puducherry, IN.
J
J. N. Acosta
Department of Biomedical Informatics, Harvard Medical School, Boston, USA.
E
E. Topol
Scripps Research Translational Institute, Scripps Research, La Jolla, CA, USA.
P
P. Rajpurkar
Department of Biomedical Informatics, Harvard Medical School, Boston, USA.