General Methods Make Great Domain-specific Foundation Models: A Case-study on Fetal Ultrasound

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the optimal paradigm for building domain-specific foundation models for fetal ultrasound under large-scale unlabeled data: whether medical-domain-specific self-supervised learning methods are necessary, or whether established general-purpose vision methods suffice. Method: Leveraging the DINOv2 framework, the authors conduct self-supervised pretraining on 2 million fetal ultrasound images and systematically compare domain-specific pretraining, transfer learning from natural-image models, and supervised fine-tuning. Contribution/Results: Reusing off-the-shelf general vision methods—without algorithmic innovation or hyperparameter tuning—on domain-specific data consistently outperforms transfer-learning baselines across classification, segmentation, and few-shot learning tasks. The approach achieves state-of-the-art performance on three international fetal ultrasound datasets. This work challenges the prevailing assumption that medical imaging inherently requires novel algorithms, demonstrating instead that data specificity—not methodological novelty—is the primary determinant of performance.

Technology Category

Application Category

📝 Abstract
With access to large-scale, unlabeled medical datasets, researchers are confronted with two questions: Should they attempt to pretrain a custom foundation model on this medical data, or use transfer-learning from an existing generalist model? And, if a custom model is pretrained, are novel methods required? In this paper we explore these questions by conducting a case-study, in which we train a foundation model on a large regional fetal ultrasound dataset of 2M images. By selecting the well-established DINOv2 method for pretraining, we achieve state-of-the-art results on three fetal ultrasound datasets, covering data from different countries, classification, segmentation, and few-shot tasks. We compare against a series of models pretrained on natural images, ultrasound images, and supervised baselines. Our results demonstrate two key insights: (i) Pretraining on custom data is worth it, even if smaller models are trained on less data, as scaling in natural image pretraining does not translate to ultrasound performance. (ii) Well-tuned methods from computer vision are making it feasible to train custom foundation models for a given medical domain, requiring no hyperparameter tuning and little methodological adaptation. Given these findings, we argue that a bias towards methodological innovation should be avoided when developing domain specific foundation models under common computational resource constraints.
Problem

Research questions and friction points this paper is trying to address.

Should custom or generalist models be used for medical data?
Are novel methods needed for custom medical foundation models?
Can general computer vision methods work for medical domains?
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses DINOv2 method for pretraining
Trains on large fetal ultrasound dataset
Avoids need for hyperparameter tuning
🔎 Similar Papers
No similar papers found.