🤖 AI Summary
Current self-supervised learning (SSL) for medical imaging is hindered by the scarcity of large-scale, real-world heterogeneous 3D brain MRI datasets. To address this, we introduce FOMO60K—the first ultra-large-scale, publicly available brain MRI benchmark specifically designed for SSL, comprising 60,529 scans from 11,187 subjects across 16 diverse acquisition sites, with multi-sequence contrast, multi-pathology representation, and extensive anatomical variability. Minimal standardization preprocessing is applied to preserve intrinsic data distribution characteristics. We provide a unified data integration pipeline and an open-source code framework supporting mainstream SSL methods (e.g., MAE, SimCLR) for pretraining and fine-tuning. Extensive experiments demonstrate that models pretrained on FOMO60K achieve significant improvements in zero-shot and few-shot performance across downstream tasks—including segmentation, lesion detection, and anomaly identification—thereby enabling robust generalization across scanners, protocols, and clinical settings.
📝 Abstract
We present FOMO60K, a large-scale, heterogeneous dataset of 60,529 brain Magnetic Resonance Imaging (MRI) scans from 13,900 sessions and 11,187 subjects, aggregated from 16 publicly available sources. The dataset includes both clinical- and research-grade images, multiple MRI sequences, and a wide range of anatomical and pathological variability, including scans with large brain anomalies. Minimal preprocessing was applied to preserve the original image characteristics while reducing barriers to entry for new users. Accompanying code for self-supervised pretraining and finetuning is provided. FOMO60K is intended to support the development and benchmarking of self-supervised learning methods in medical imaging at scale.