🤖 AI Summary
Current 3D medical self-supervised learning (SSL) suffers from inconsistent dataset scales and diversity, heterogeneous model architectures, and a lack of standardized downstream evaluation protocols, hindering fair methodological comparison. To address this, we introduce the first large-scale, publicly available 3D brain MRI pretraining dataset (114K scans) and establish a unified benchmark—standardizing data, architectures (CNN/ViT), and downstream tasks (multi-center segmentation)—to systematically evaluate mainstream SSL paradigms (contrastive learning and masked modeling). Our contributions are threefold: (1) the largest open-source 3D brain MRI pretraining dataset to date; (2) the first standardized 3D medical SSL benchmark; and (3) fully open-sourced training framework, pretrained models, and reproducible code. Experiments demonstrate that SSL pretraining significantly outperforms end-to-end trained nnU-Net ResEnc-L baselines and substantially improves segmentation performance in low-data regimes, establishing a new state-of-the-art practice for 3D medical SSL.
📝 Abstract
The field of self-supervised learning (SSL) for 3D medical images lacks consistency and standardization. While many methods have been developed, it is impossible to identify the current state-of-the-art, due to i) varying and small pretraining datasets, ii) varying architectures, and iii) being evaluated on differing downstream datasets. In this paper, we bring clarity to this field and lay the foundation for further method advancements through three key contributions: We a) publish the largest publicly available pre-training dataset comprising 114k 3D brain MRI volumes, enabling all practitioners to pre-train on a large-scale dataset. We b) benchmark existing 3D self-supervised learning methods on this dataset for a state-of-the-art CNN and Transformer architecture, clarifying the state of 3D SSL pre-training. Among many findings, we show that pre-trained methods can exceed a strong from-scratch nnU-Net ResEnc-L baseline. Lastly, we c) publish the code of our pre-training and fine-tuning frameworks and provide the pre-trained models created during the benchmarking process to facilitate rapid adoption and reproduction.