A large-scale heterogeneous 3D magnetic resonance brain imaging dataset for self-supervised learning

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current self-supervised learning (SSL) for medical imaging is hindered by the scarcity of large-scale, real-world heterogeneous 3D brain MRI datasets. To address this, we introduce FOMO60K—the first ultra-large-scale, publicly available brain MRI benchmark specifically designed for SSL, comprising 60,529 scans from 11,187 subjects across 16 diverse acquisition sites, with multi-sequence contrast, multi-pathology representation, and extensive anatomical variability. Minimal standardization preprocessing is applied to preserve intrinsic data distribution characteristics. We provide a unified data integration pipeline and an open-source code framework supporting mainstream SSL methods (e.g., MAE, SimCLR) for pretraining and fine-tuning. Extensive experiments demonstrate that models pretrained on FOMO60K achieve significant improvements in zero-shot and few-shot performance across downstream tasks—including segmentation, lesion detection, and anomaly identification—thereby enabling robust generalization across scanners, protocols, and clinical settings.

Technology Category

Application Category

📝 Abstract
We present FOMO60K, a large-scale, heterogeneous dataset of 60,529 brain Magnetic Resonance Imaging (MRI) scans from 13,900 sessions and 11,187 subjects, aggregated from 16 publicly available sources. The dataset includes both clinical- and research-grade images, multiple MRI sequences, and a wide range of anatomical and pathological variability, including scans with large brain anomalies. Minimal preprocessing was applied to preserve the original image characteristics while reducing barriers to entry for new users. Accompanying code for self-supervised pretraining and finetuning is provided. FOMO60K is intended to support the development and benchmarking of self-supervised learning methods in medical imaging at scale.
Problem

Research questions and friction points this paper is trying to address.

Provide large-scale MRI dataset for self-supervised learning
Include diverse brain images with anatomical variability
Support development of medical imaging benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale heterogeneous MRI dataset FOMO60K
Minimal preprocessing preserves original characteristics
Self-supervised pretraining and finetuning code provided
🔎 Similar Papers
No similar papers found.
Asbjørn Munk
Asbjørn Munk
PhD Fellow, University of Copenhagen, Pioneer Centre for AI
self-supervised learningmedical image analysis
S
Stefano Cerri
Department of Computer Science, University of Copenhagen, Denmark; Pioneer Centre For AI, Denmark; Copenhagen Research Centre for Biological and Precision Psychiatry, Mental Health Centre Copenhagen, Copenhagen University Hospital, Denmark
Jakob Ambsdorf
Jakob Ambsdorf
PhD Student, Pioneer Centre for AI, University of Copenhagen
self-supervised learningexplainable AIcomputer visionmedical imaging
Julia Machnio
Julia Machnio
PhD Fellow @ Pioneer Centre for AI, University of Copenhagen
Deep LearningMedical Image AnalysisMachine LearningMRI
Sebastian Nørgaard Llambias
Sebastian Nørgaard Llambias
IT Consultant, University of Copenhagen
Machine LearningMedical Imaging
V
Vardan Nersesjan
Copenhagen Research Centre for Biological and Precision Psychiatry, Mental Health Centre Copenhagen, Copenhagen University Hospital, Denmark
C
Christian Hedeager Krag
Radiological AI Testcenter, Denmark; Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
Peirong Liu
Peirong Liu
Assistant Professor of ECE, Johns Hopkins University
AI for HealthcareComputer VisionMedical Imaging
P
Pablo Rocamora Garc'ia
Department of Computer Science, University of Copenhagen, Denmark; Pioneer Centre For AI, Denmark
Mostafa Mehdipour Ghazi
Mostafa Mehdipour Ghazi
University of Copenhagen
Deep LearningMachine LearningArtificial IntelligenceComputer VisionImage Analysis
M
Mikael Boesen
Radiological AI Testcenter, Denmark; Copenhagen University Hospital, Bispebjerg & Frederiksberg Hospital, Denmark
Michael Eriksen Benros
Michael Eriksen Benros
Professor; University of Copenhagen and Copenhagen University Hospital, Mental Health Centre
Mental HealthImmunoPsychiatryPrecision PsychiatryGeneticsBig Data
Juan Eugenio Iglesias
Juan Eugenio Iglesias
Massachusetts General Hospital & Harvard Medical School / MIT / UCL
Medical Image Analysis
Mads Nielsen
Mads Nielsen
Professor of Computer Science, University of Copenhagen
Computer ScienceComputer visionMachine LearningArtificial IntelligenceMedical Applications