Multimodal, Multi-Disease Medical Imaging Foundation Model (MerMED-FM)

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Current medical imaging AI models are predominantly unimodal and disease-specific, exhibiting poor generalizability and heavy reliance on large-scale annotated datasets. To address these limitations, we propose the first cross-modal, cross-disease foundation model for medical imaging, integrating self-supervised pretraining with a novel memory-augmented mechanism. The model is trained on a unified, multi-source dataset comprising 3.3 million images spanning seven modalities—CT, X-ray, ultrasound, histopathology, fundus photography, optical coherence tomography (OCT), and dermatoscopic imaging—across over ten clinical specialties. This design significantly enhances robustness and clinical adaptability for multi-disease recognition. On diverse multimodal downstream tasks, the model achieves AUROC scores ranging from 0.858 to 0.988, consistently outperforming existing foundation models. These results validate its superior generalization capability and strong potential for real-world clinical deployment.

Technology Category

Application Category

📝 Abstract

Current artificial intelligence models for medical imaging are predominantly single modality and single disease. Attempts to create multimodal and multi-disease models have resulted in inconsistent clinical accuracy. Furthermore, training these models typically requires large, labour-intensive, well-labelled datasets. We developed MerMED-FM, a state-of-the-art multimodal, multi-specialty foundation model trained using self-supervised learning and a memory module. MerMED-FM was trained on 3.3 million medical images from over ten specialties and seven modalities, including computed tomography (CT), chest X-rays (CXR), ultrasound (US), pathology patches, color fundus photography (CFP), optical coherence tomography (OCT) and dermatology images. MerMED-FM was evaluated across multiple diseases and compared against existing foundational models. Strong performance was achieved across all modalities, with AUROCs of 0.988 (OCT); 0.982 (pathology); 0.951 (US); 0.943 (CT); 0.931 (skin); 0.894 (CFP); 0.858 (CXR). MerMED-FM has the potential to be a highly adaptable, versatile, cross-specialty foundation model that enables robust medical imaging interpretation across diverse medical disciplines.

Problem

Research questions and friction points this paper is trying to address.

Develops a multimodal, multi-disease medical imaging model

Addresses inconsistent clinical accuracy in existing models

Reduces reliance on large, labeled datasets via self-supervised learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal multi-disease foundation model

Self-supervised learning with memory module

Trained on 3.3M images across 7 modalities

🔎 Similar Papers

No similar papers found.