OphMAE: Bridging Volumetric and Planar Imaging with a Foundation Model for Adaptive Ophthalmological Diagnosis

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

223K/year
🤖 AI Summary
This work addresses the limitations of existing ophthalmic AI systems, which are often confined to single-modality analysis and struggle to effectively integrate complementary 3D OCT and 2D en face OCT images while facing deployment challenges in resource-constrained settings. The authors propose OphMAE, a multimodal foundation model for ophthalmic diagnosis built upon a masked autoencoder framework, featuring cross-modal fusion and adaptive inference mechanisms that enable joint 3D/2D pretraining and efficient unimodal inference. Evaluated across 17 diagnostic tasks, OphMAE achieves state-of-the-art performance, with AUCs of 96.9% for AMD and 97.2% for DME. Notably, it maintains strong performance using only 2D inputs (AMD AUC: 93.7%) and retains an AUC of 95.7% with as few as 500 labeled samples, substantially alleviating modality dependency and data efficiency bottlenecks.
📝 Abstract
The advent of foundation models has heralded a new era in medical artificial intelligence (AI), enabling the extraction of generalizable representations from large-scale unlabeled datasets. However, current ophthalmic AI paradigms are predominantly constrained to single-modality inference, thereby creating a dissonance with clinical practice where diagnosis relies on the synthesis of complementary imaging modalities. Furthermore, the deployment of high-performance AI in resource-limited settings is frequently impeded by the unavailability of advanced three-dimensional imaging hardware. Here, we present the Ophthalmic multimodal Masked Autoencoder (OphMAE), a multi-imaging foundation model engineered to synergize the volumetric depth of 3D Optical Coherence Tomography (OCT) with the planar context of 2D en face OCT. By implementing a novel cross-modal fusion architecture and a unique adaptive inference mechanism, OphMAE was pre-trained on a massive dataset with of 183,875 paired OCT images derived from 32,765 patients. In a rigorous benchmark encompassing 17 diverse diagnostic tasks with 48,340 paired OCT images from 8,191 patients, the model demonstrated state-of-the-art performance, achieving an Area Under the Curve (AUC) of 96.9% for Age-related Macular Degeneration (AMD) and 97.2% for Diabetic Macular Edema (DME), consistently surpassing existing single-modal and multimodal foundation models. Crucially, OphMAE exhibits robust engineering adaptability: it maintains high diagnostic accuracy, such as 93.7\% AUC for AMD, even when restricted to single-modality 2D inputs, and demonstrates exceptional data efficiency by retaining 95.7% AUC with as few as 500 labeled samples. This work establishes a scalable and adaptable framework for ophthalmic AI, ensuring robust performance across different tasks.
Problem

Research questions and friction points this paper is trying to address.

multimodal imaging
ophthalmic AI
3D OCT
2D en face OCT
resource-limited settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation model
multimodal fusion
adaptive inference
OCT imaging
data efficiency
T
Tienyu Chang
Department of Biostatistics and Health Data Science, Indiana University, Indianapolis, IN
Zhen Chen
Zhen Chen
Yale University
AI for healthcareMultimodal large modelsMedical image analysisSurgery understanding
R
Renjie Liang
Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL
J
Jinyu Ding
Department of Biomedical Informatics and Data Science, Yale University, New Haven, CT
Jie Xu
Jie Xu
Health Outcomes & Biomedical Informatics, University of Florida
Machine LearningData MiningHealth Informatics
S
Sunu Mathew
Radiology & Imaging Sciences, Indiana University, Indianapolis, IN
A
Amir Reza Hajrasouliha
Ophthalmology, Indiana University, Indianapolis, IN
A
Andrew J. Saykin
Radiology & Imaging Sciences, Indiana University, Indianapolis, IN
Ruogu Fang
Ruogu Fang
Professor, University of Florida
Artificial IntelligenceMedical Image AnalysisMachine LearningBrain Dynamics
Y
Yu Huang
Department of Biostatistics and Health Data Science, Indiana University, Indianapolis, IN
Jiang Bian
Jiang Bian
Regenstrief Institue; Indiana University; IU Health
data sciencereal-world dataontology/semanticeHealth/social media
Qingyu Chen
Qingyu Chen
Biomedical Informatics & Data Science, Yale University; NCBI-NLM, National Institutes of Health
Text miningMachine learningData curationBioNLPMedical Imaging Analysis