MedGemma 1.5 Technical Report

📅 2026-04-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes MedGemma 1.5 4B, the first multimodal medical foundation model capable of jointly processing 3D medical imaging, whole-slide pathology images, longitudinal X-rays, and electronic health records within a unified architecture. By integrating long-context 3D volumetric tiling, efficient sampling of gigapixel pathology images, anatomical structure bounding-box localization, temporal alignment of multi-phase imaging, and enhanced medical text understanding, the model substantially advances cross-modal clinical reasoning. Experimental results demonstrate significant performance gains: 11% and 3% absolute improvements in classification accuracy on 3D MRI and CT scans, respectively; a 47% increase in macro F1 score for pathology image analysis; a 35% improvement in chest X-ray localization IoU; and 5% and 22% accuracy boosts on MedQA and EHRQA benchmarks, respectively.
📝 Abstract
We introduce MedGemma 1.5 4B, the latest model in the MedGemma collection. MedGemma 1.5 expands on MedGemma 1 by integrating additional capabilities: high-dimensional medical imaging (CT/MRI volumes and histopathology whole slide images), anatomical localization via bounding boxes, multi-timepoint chest X-ray analysis, and improved medical document understanding (lab reports, electronic health records). We detail the innovations required to enable these modalities within a single architecture, including new training data, long-context 3D volume slicing, and whole-slide pathology sampling. Compared to MedGemma 1 4B, MedGemma 1.5 4B demonstrates significant gains in these new areas, improving 3D MRI condition classification accuracy by 11% and 3D CT condition classification by 3% (absolute improvements). In whole slide pathology imaging, MedGemma 1.5 4B achieves a 47% macro F1 gain. Additionally, it improves anatomical localization with a 35% increase in Intersection over Union on chest X-rays and achieves a 4% macro accuracy for longitudinal (multi-timepoint) chest x-ray analysis. Beyond its improved multimodal performance over MedGemma 1, MedGemma 1.5 improves on text-based clinical knowledge and reasoning, improving by 5% on MedQA accuracy and 22% on EHRQA accuracy. It also achieves an average of 18% macro F1 on 4 different lab report information extraction datasets (EHR Datasets 2, 3, 4, and Mendeley Clinical Laboratory Test Reports). Taken together, MedGemma 1.5 serves as a robust, open resource for the community, designed as an improved foundation on which developers can create the next generation of medical AI systems. Resources and tutorials for building upon MedGemma 1.5 can be found at https://goo.gle/MedGemma.
Problem

Research questions and friction points this paper is trying to address.

multimodal medical AI
3D medical imaging
whole slide pathology
anatomical localization
medical document understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal medical AI
3D medical imaging
whole-slide pathology
anatomical localization
longitudinal chest X-ray analysis
🔎 Similar Papers
No similar papers found.
Andrew Sellergren
Andrew Sellergren
Software Engineer
computer visionmedical imagingmachine learningartificial intelligence
Chufan Gao
Chufan Gao
University of Illinois Urbana-Champaign
Machine Learning for HealthcareNatural Language Processing
F
Fereshteh Mahvar
Google Research and Google DeepMind
T
Timo Kohlberger
Google Research and Google DeepMind
F
Fayaz Jamil
Google Research and Google DeepMind
M
Madeleine Traverse
Google Research and Google DeepMind
A
Alberto Tono
Google Research and Google DeepMind
B
Bashir Sadjad
Google Research and Google DeepMind
Lin Yang
Lin Yang
Google Health, University of Notre Dame
Computer Science
C
Charles Lau
Google Research and Google DeepMind
L
Liron Yatziv
Google Research and Google DeepMind
T
Tiffany Chen
Google Research and Google DeepMind
B
Bram Sterling
Google Research and Google DeepMind
K
Kenneth Philbrick
Google Research and Google DeepMind
R
Richa Tiwari
Google Research and Google DeepMind
Yun Liu
Yun Liu
Senior Staff Research Scientist, Google Research
Applied Machine LearningHealthcareBiomedical Data
M
Madhuram Jajoo
Google Research and Google DeepMind
C
Chandrashekar Sankarapu
Google Research and Google DeepMind
S
Swapnil Vispute
Google Research and Google DeepMind
H
Harshad Purandare
Google Research and Google DeepMind
A
Abhishek Bijay Mishra
Google Research and Google DeepMind
S
Sam Schmidgall
Google Research and Google DeepMind
Tao Tu
Tao Tu
Columbia University, Google
multi-modal neuroimagingmachine learningneural information processing
Anil Palepu
Anil Palepu
PhD Student, Harvard-MIT Health Science & Technology
Chunjong Park
Chunjong Park
Google DeepMind