Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current medical imaging AI models are predominantly unimodal and disease-specific, exhibiting poor generalizability and heavy reliance on large-scale annotated datasets. To address these limitations, we propose the first cross-modal, cross-disease foundation model for medical imaging, integrating self-supervised pretraining with a novel memory-augmented mechanism. The model is trained on a unified, multi-source dataset comprising 3.3 million images spanning seven modalities—CT, X-ray, ultrasound, histopathology, fundus photography, optical coherence tomography (OCT), and dermatoscopic imaging—across over ten clinical specialties. This design significantly enhances robustness and clinical adaptability for multi-disease recognition. On diverse multimodal downstream tasks, the model achieves AUROC scores ranging from 0.858 to 0.988, consistently outperforming existing foundation models. These results validate its superior generalization capability and strong potential for real-world clinical deployment.

Technology Category

Application Category

📝 Abstract
Current artificial intelligence models for medical imaging are predominantly single modality and single disease. Attempts to create multimodal and multi-disease models have resulted in inconsistent clinical accuracy. Furthermore, training these models typically requires large, labour-intensive, well-labelled datasets. We developed MerMED-FM, a state-of-the-art multimodal, multi-specialty foundation model trained using self-supervised learning and a memory module. MerMED-FM was trained on 3.3 million medical images from over ten specialties and seven modalities, including computed tomography (CT), chest X-rays (CXR), ultrasound (US), pathology patches, color fundus photography (CFP), optical coherence tomography (OCT) and dermatology images. MerMED-FM was evaluated across multiple diseases and compared against existing foundational models. Strong performance was achieved across all modalities, with AUROCs of 0.988 (OCT); 0.982 (pathology); 0.951 (US); 0.943 (CT); 0.931 (skin); 0.894 (CFP); 0.858 (CXR). MerMED-FM has the potential to be a highly adaptable, versatile, cross-specialty foundation model that enables robust medical imaging interpretation across diverse medical disciplines.
Problem

Research questions and friction points this paper is trying to address.

Develops a multimodal, multi-disease medical imaging model
Addresses inconsistent clinical accuracy in existing models
Reduces reliance on large, labeled datasets via self-supervised learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal multi-disease foundation model
Self-supervised learning with memory module
Trained on 3.3M images across 7 modalities
🔎 Similar Papers
No similar papers found.
Y
Yang Zhou
Institute of High Performance Computing (IHPC), A*STAR
C
Chrystie Wan Ning Quek
Singapore National Eye Centre, Singapore Eye Research Institute, Singapore
J
Jun Zhou
Institute of High Performance Computing (IHPC), A*STAR
Y
Yan Wang
Institute of High Performance Computing (IHPC), A*STAR
Y
Yang Bai
Institute of High Performance Computing (IHPC), A*STAR
Y
Yuhe Ke
Department of Anesthesiology, Singapore General Hospital; Duke-NUS Medical School, Singapore, Singapore
J
Jie Yao
Singapore National Eye Centre, Singapore Eye Research Institute, Singapore; Singapore Health Services, Artificial Intelligence Office
L
Laura Gutierrez
Singapore National Eye Centre, Singapore Eye Research Institute, Singapore; Singapore Health Services, Artificial Intelligence Office
Z
Zhen Ling Teo
Singapore National Eye Centre, Singapore Eye Research Institute, Singapore
D
Darren Shu Jeng Ting
Singapore National Eye Centre, Singapore Eye Research Institute, Singapore; Ophthalmology and Visual Sciences Academic Clinical Program, Duke-NUS Medical School, Singapore
B
Brian T. Soetikno
Institute of High Performance Computing (IHPC), A*STAR
C
Christopher S. Nielsen
Institute of High Performance Computing (IHPC), A*STAR
Tobias Elze
Tobias Elze
Schepens Eye Research Institute, Harvard Medical School
ophthalmologymachine learning
Z
Zengxiang Li
Institute of High Performance Computing (IHPC), A*STAR
L
Linh Le Dinh
Institute of High Performance Computing (IHPC), A*STAR
L
Lionel Tim-Ee Cheng
Duke-NUS Medical School, Singapore, Singapore; FRCR
T
Tran Nguyen Tuan Anh
Duke-NUS Medical School, Singapore, Singapore; FRCR
C
Chee Leong Cheng
FRCPath
Tien Yin Wong
Tien Yin Wong
Singapore National Eye Centre, Singapore Eye Research Institute, Singapore; Duke-NUS Medical School, Singapore, Singapore
N
Nan Liu
Singapore Health Services, Artificial Intelligence Office
I
Iain Beehuat Tan
Singapore National Eye Centre, Singapore Eye Research Institute, Singapore; Duke-NUS Medical School, Singapore, Singapore
T
Tony Kiat Hon Lim
FRCPath
R
Rick Siow Mong Goh
Institute of High Performance Computing (IHPC), A*STAR
Y
Yong Liu
Institute of High Performance Computing (IHPC), A*STAR
D
Daniel Shu Wei Ting
Singapore National Eye Centre, Singapore Eye Research Institute, Singapore; Duke-NUS Medical School, Singapore, Singapore