🤖 AI Summary
This work addresses the limitations of existing ophthalmic AI systems, which are often confined to single-modality analysis and struggle to effectively integrate complementary 3D OCT and 2D en face OCT images while facing deployment challenges in resource-constrained settings. The authors propose OphMAE, a multimodal foundation model for ophthalmic diagnosis built upon a masked autoencoder framework, featuring cross-modal fusion and adaptive inference mechanisms that enable joint 3D/2D pretraining and efficient unimodal inference. Evaluated across 17 diagnostic tasks, OphMAE achieves state-of-the-art performance, with AUCs of 96.9% for AMD and 97.2% for DME. Notably, it maintains strong performance using only 2D inputs (AMD AUC: 93.7%) and retains an AUC of 95.7% with as few as 500 labeled samples, substantially alleviating modality dependency and data efficiency bottlenecks.
📝 Abstract
The advent of foundation models has heralded a new era in medical artificial intelligence (AI), enabling the extraction of generalizable representations from large-scale unlabeled datasets. However, current ophthalmic AI paradigms are predominantly constrained to single-modality inference, thereby creating a dissonance with clinical practice where diagnosis relies on the synthesis of complementary imaging modalities. Furthermore, the deployment of high-performance AI in resource-limited settings is frequently impeded by the unavailability of advanced three-dimensional imaging hardware. Here, we present the Ophthalmic multimodal Masked Autoencoder (OphMAE), a multi-imaging foundation model engineered to synergize the volumetric depth of 3D Optical Coherence Tomography (OCT) with the planar context of 2D en face OCT. By implementing a novel cross-modal fusion architecture and a unique adaptive inference mechanism, OphMAE was pre-trained on a massive dataset with of 183,875 paired OCT images derived from 32,765 patients. In a rigorous benchmark encompassing 17 diverse diagnostic tasks with 48,340 paired OCT images from 8,191 patients, the model demonstrated state-of-the-art performance, achieving an Area Under the Curve (AUC) of 96.9% for Age-related Macular Degeneration (AMD) and 97.2% for Diabetic Macular Edema (DME), consistently surpassing existing single-modal and multimodal foundation models. Crucially, OphMAE exhibits robust engineering adaptability: it maintains high diagnostic accuracy, such as 93.7\% AUC for AMD, even when restricted to single-modality 2D inputs, and demonstrates exceptional data efficiency by retaining 95.7% AUC with as few as 500 labeled samples. This work establishes a scalable and adaptable framework for ophthalmic AI, ensuring robust performance across different tasks.