A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Current self-supervised learning (SSL) for medical imaging is hindered by the scarcity of large-scale, real-world heterogeneous 3D brain MRI datasets. To address this, we introduce FOMO60K—the first ultra-large-scale, publicly available brain MRI benchmark specifically designed for SSL, comprising 60,529 scans from 11,187 subjects across 16 diverse acquisition sites, with multi-sequence contrast, multi-pathology representation, and extensive anatomical variability. Minimal standardization preprocessing is applied to preserve intrinsic data distribution characteristics. We provide a unified data integration pipeline and an open-source code framework supporting mainstream SSL methods (e.g., MAE, SimCLR) for pretraining and fine-tuning. Extensive experiments demonstrate that models pretrained on FOMO60K achieve significant improvements in zero-shot and few-shot performance across downstream tasks—including segmentation, lesion detection, and anomaly identification—thereby enabling robust generalization across scanners, protocols, and clinical settings.

Technology Category

Application Category

📝 Abstract

We present FOMO60K, a large-scale, heterogeneous dataset of 60,529 brain Magnetic Resonance Imaging (MRI) scans from 13,900 sessions and 11,187 subjects, aggregated from 16 publicly available sources. The dataset includes both clinical- and research-grade images, multiple MRI sequences, and a wide range of anatomical and pathological variability, including scans with large brain anomalies. Minimal preprocessing was applied to preserve the original image characteristics while reducing barriers to entry for new users. Accompanying code for self-supervised pretraining and finetuning is provided. FOMO60K is intended to support the development and benchmarking of self-supervised learning methods in medical imaging at scale.

Problem

Research questions and friction points this paper is trying to address.

Provide large-scale MRI dataset for self-supervised learning

Include diverse brain images with anatomical variability

Support development of medical imaging benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale heterogeneous MRI dataset FOMO60K

Minimal preprocessing preserves original characteristics

Self-supervised pretraining and finetuning code provided

🔎 Similar Papers

Brain3D: Generating 3D Objects from fMRI