🤖 AI Summary
Current medical imaging AI models are predominantly unimodal and disease-specific, exhibiting poor generalizability and heavy reliance on large-scale annotated datasets. To address these limitations, we propose the first cross-modal, cross-disease foundation model for medical imaging, integrating self-supervised pretraining with a novel memory-augmented mechanism. The model is trained on a unified, multi-source dataset comprising 3.3 million images spanning seven modalities—CT, X-ray, ultrasound, histopathology, fundus photography, optical coherence tomography (OCT), and dermatoscopic imaging—across over ten clinical specialties. This design significantly enhances robustness and clinical adaptability for multi-disease recognition. On diverse multimodal downstream tasks, the model achieves AUROC scores ranging from 0.858 to 0.988, consistently outperforming existing foundation models. These results validate its superior generalization capability and strong potential for real-world clinical deployment.
📝 Abstract
Current artificial intelligence models for medical imaging are predominantly single modality and single disease. Attempts to create multimodal and multi-disease models have resulted in inconsistent clinical accuracy. Furthermore, training these models typically requires large, labour-intensive, well-labelled datasets. We developed MerMED-FM, a state-of-the-art multimodal, multi-specialty foundation model trained using self-supervised learning and a memory module. MerMED-FM was trained on 3.3 million medical images from over ten specialties and seven modalities, including computed tomography (CT), chest X-rays (CXR), ultrasound (US), pathology patches, color fundus photography (CFP), optical coherence tomography (OCT) and dermatology images. MerMED-FM was evaluated across multiple diseases and compared against existing foundational models. Strong performance was achieved across all modalities, with AUROCs of 0.988 (OCT); 0.982 (pathology); 0.951 (US); 0.943 (CT); 0.931 (skin); 0.894 (CFP); 0.858 (CXR). MerMED-FM has the potential to be a highly adaptable, versatile, cross-specialty foundation model that enables robust medical imaging interpretation across diverse medical disciplines.