A multimodal and temporal foundation model for virtual patient representations at healthcare system scale

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

180K/year
🤖 AI Summary
This work addresses the longstanding fragmentation of multimodal, longitudinal clinical data in healthcare systems and the absence of a unified, comprehensive patient representation. The authors propose Apollo, the first multimodal temporal foundation model encompassing 28 data modalities across 12 clinical specialties, trained on 30 years of real-world data from 7.2 million patients and 25 billion clinical records. Apollo integrates structured events, clinical notes, and medical images to generate a unified virtual patient embedding. A key innovation is the construction of a medical concept graph comprising over 100,000 concepts, enabling disease risk prediction up to five years in advance and cross-modal semantic retrieval. Evaluated on 1.4 million test patients across 322 tasks, Apollo demonstrates exceptional performance in predicting incident diseases, treatment responses, and adverse event risks, while also functioning effectively as a multimodal medical search engine.

Technology Category

Application Category

📝 Abstract
Modern medicine generates vast multimodal data across siloed systems, yet no existing model integrates the full breadth and temporal depth of the clinical record into a unified patient representation. We introduce Apollo, a multimodal temporal foundation model trained and evaluated on over three decades of longitudinal hospital records from a major US hospital system, composed of 25 billion records from 7.2 million patients, representing 28 distinct medical modalities and 12 major medical specialties. Apollo learns a unified representation space integrating over 100 thousand unique medical events in our clinical vocabulary as well as images and clinical text. This "atlas of medical concepts" forms a computational substrate for modeling entire patient care journeys comprised of sequences of structured and unstructured events, which are compressed by Apollo into virtual patient representations. To assess the potential of these whole-patient representations, we created 322 prognosis and retrieval tasks from a held-out test set of 1.4 million patients. We demonstrate the generalized clinical forecasting potential of Apollo embeddings, including predicting new disease onset risk up to five years in advance (95 tasks), disease progression (78 tasks), treatment response (59 tasks), risk of treatment-related adverse events (17 tasks), and hospital operations endpoints (12 tasks). Using feature attribution techniques, we show that model predictions align with clinically-interpretable multimodal biomarkers. We evaluate semantic similarity search on 61 retrieval tasks, and moreover demonstrate the potential of Apollo as a multimodal medical search engine using text and image queries. Together, these modeling capabilities establish the foundation for computable medicine, where the full context of patient care becomes accessible to computational reasoning.
Problem

Research questions and friction points this paper is trying to address.

multimodal
temporal
patient representation
clinical record integration
foundation model
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal foundation model
temporal patient representation
virtual patient
computable medicine
clinical forecasting
🔎 Similar Papers
No similar papers found.
Andrew Zhang
Andrew Zhang
PhD student, Harvard & MIT
computer visionartificial intelligencehealthcaremedical devicesneuroscience
Tong Ding
Tong Ding
PhD student in Computer Science, Harvard University
Representation LearningComputer VisionMultimodal LearningMachine Learning for Health
Sophia J. Wagner
Sophia J. Wagner
Technical University Munich, Helmholtz AI
computational pathologydeep learningcomputer vision
C
Caiwei Tian
Department of Pathology, Mass General Brigham, Harvard Medical School, Boston, MA
Ming Y. Lu
Ming Y. Lu
MIT EECS, Harvard Medical School
Computational PathologyComputer VisionNatural Language Processing
R
Rowland Pettit
Department of Pathology, Mass General Brigham, Harvard Medical School, Boston, MA
J
Joshua E. Lewis
Department of Pathology, Mass General Brigham, Harvard Medical School, Boston, MA
A
Alexandre Misrahi
Department of Pathology, Mass General Brigham, Harvard Medical School, Boston, MA
D
Dandan Mo
Department of Pathology, Mass General Brigham, Harvard Medical School, Boston, MA
Long Phi Le
Long Phi Le
Massachusetts General Hospital
Molecular DiagnosticsNext-Generation SequencingTarget EnrichmentBioinformaticsLaboratory Informatics
Faisal Mahmood
Faisal Mahmood
Associate Professor, Harvard University