Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact

📅 2024-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current evaluation of self-supervised learning (SSL) for medical imaging is fragmented and lacks comprehensiveness. Method: We introduce MedSSL-Bench—the first systematic benchmark for SSL in medical imaging—encompassing 11 diverse medical datasets and 8 representative SSL methods. We propose a multi-dimensional evaluation framework that uniformly quantifies robustness, cross-domain generalization, label-scarcity adaptability, and multimodal pretraining efficacy, alongside a standardized MedMNIST evaluation protocol. Contribution/Results: Experiments reveal substantial performance disparities across SSL methods in generalization and robustness. Joint pretraining across multiple domains yields an average accuracy gain of 5.2%. We publicly release fully reproducible code, configuration files, and a live leaderboard to support rigorous, transparent model selection for clinical AI deployment.

Technology Category

Application Category

📝 Abstract
Self-supervised learning (SSL) has emerged as a promising paradigm in medical imaging, addressing the chronic challenge of limited labeled data in healthcare settings. While SSL has shown impressive results, existing studies in the medical domain are often limited in scope, focusing on specific datasets or modalities, or evaluating only isolated aspects of model performance. This fragmented evaluation approach poses a significant challenge, as models deployed in critical medical settings must not only achieve high accuracy but also demonstrate robust performance and generalizability across diverse datasets and varying conditions. To address this gap, we present a comprehensive evaluation of SSL methods within the medical domain, with a particular focus on robustness and generalizability. Using the MedMNIST dataset collection as a standardized benchmark, we evaluate 8 major SSL methods across 11 different medical datasets. Our study provides an in-depth analysis of model performance in both in-domain scenarios and the detection of out-of-distribution (OOD) samples, while exploring the effect of various initialization strategies, model architectures, and multi-domain pre-training. We further assess the generalizability of SSL methods through cross-dataset evaluations and the in-domain performance with varying label proportions (1%, 10%, and 100%) to simulate real-world scenarios with limited supervision. We hope this comprehensive benchmark helps practitioners and researchers make more informed decisions when applying SSL methods to medical applications.
Problem

Research questions and friction points this paper is trying to address.

Self-Supervised Learning
Medical Imaging
Performance Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Supervised Learning
Medical Imaging
Adaptability and Stability
🔎 Similar Papers
No similar papers found.
Valay Bundele
Valay Bundele
PhD student, International Max Planck Research School for Intelligent Systems, Universität Tübingen
Machine LearningComputer Vision
O
Oğuz Ata Çal
University of Tübingen
B
Bora Kargi
University of Tübingen
K
Karahan Sarıtaş
University of Tübingen
K
Kıvanç Tezören
University of Tübingen
Z
Zohreh Ghaderi
University of Tübingen
H
Hendrik Lensch
University of Tübingen