🤖 AI Summary
Clinical brain MRI analysis is hindered by data heterogeneity, high noise levels, and the prohibitive cost of annotations, which impede the clinical deployment of automated models. To address this, this work introduces the FOMO25 challenge, leveraging the large-scale unlabeled clinical dataset FOMO60K to systematically evaluate the generalization capabilities of self-supervised foundation models under few-shot and out-of-distribution settings across three tasks: infarction classification, meningioma segmentation, and brain age regression. Innovatively benchmarking model performance on real-world clinical workflow data, the study reveals task-dependent effects of different self-supervision objectives and demonstrates that even modestly sized pre-trained models can achieve strong performance. Results show that self-supervised pre-training substantially enhances generalization, with the best out-of-distribution model surpassing in-domain supervised baselines, while increased model scale and training duration yield no consistent gains.
📝 Abstract
Clinical deployment of automated brain MRI analysis faces a fundamental challenge: clinical data is heterogeneous and noisy, and high-quality labels are prohibitively costly to obtain. Self-supervised learning (SSL) can address this by leveraging the vast amounts of unlabeled data produced in clinical workflows to train robust \textit{foundation models} that adapt out-of-domain with minimal supervision. However, the development of foundation models for brain MRI has been limited by small pretraining datasets and in-domain benchmarking focused on high-quality, research-grade data. To address this gap, we organized the FOMO25 challenge as a satellite event at MICCAI 2025. FOMO25 provided participants with a large pretraining dataset, FOMO60K, and evaluated models on data sourced directly from clinical workflows in few-shot and out-of-domain settings. Tasks covered infarct classification, meningioma segmentation, and brain age regression, and considered both models trained on FOMO60K (method track) and any data (open track). Nineteen foundation models from sixteen teams were evaluated using a standardized containerized pipeline. Results show that (a) self-supervised pretraining improves generalization on clinical data under domain shift, with the strongest models trained \textit{out-of-domain} surpassing supervised baselines trained \textit{in-domain}. (b) No single pretraining objective benefits all tasks: MAE favors segmentation, hybrid reconstruction-contrastive objectives favor classification, and (c) strong performance was achieved by small pretrained models, and improvements from scaling model size and training duration did not yield reliable benefits.