đ¤ AI Summary
Accurate identification of fetal brain standard planes (e.g., transcerebellar, transventricular, and thalamic views) in ultrasound is challenging due to extremely low inter-class discriminability. Method: We introduce FetalUS-188Kâthe first large-scale, multi-center benchmark dataset for fetal ultrasoundâand conduct the first domain-adaptive self-supervised pretraining of a foundation model (DINOv3) specifically for this modality. We propose and validate the necessity of ultrasound-specific pretraining: compared to initialization from natural-image models, in-domain pretraining on FetalUS-188K substantially enhances representation discriminability, yielding up to a 20% improvement in weighted F1-score. Results are consistent across linear probing and full fine-tuning, confirming that domain-specific representation learning is critical for tasks with minimal inter-class variation. This work establishes a methodological paradigm and empirical benchmark for domain adaptation of foundation models in medical imaging.
đ Abstract
Purpose: This study provides the first comprehensive evaluation of foundation models in fetal ultrasound (US) imaging under low inter-class variability conditions. While recent vision foundation models such as DINOv3 have shown remarkable transferability across medical domains, their ability to discriminate anatomically similar structures has not been systematically investigated. We address this gap by focusing on fetal brain standard planes--transthalamic (TT), transventricular (TV), and transcerebellar (TC)--which exhibit highly overlapping anatomical features and pose a critical challenge for reliable biometric assessment. Methods: To ensure a fair and reproducible evaluation, all publicly available fetal ultrasound datasets were curated and aggregated into a unified multicenter benchmark, FetalUS-188K, comprising more than 188,000 annotated images from heterogeneous acquisition settings. DINOv3 was pretrained in a self-supervised manner to learn ultrasound-aware representations. The learned features were then evaluated through standardized adaptation protocols, including linear probing with frozen backbone and full fine-tuning, under two initialization schemes: (i) pretraining on FetalUS-188K and (ii) initialization from natural-image DINOv3 weights. Results: Models pretrained on fetal ultrasound data consistently outperformed those initialized on natural images, with weighted F1-score improvements of up to 20 percent. Domain-adaptive pretraining enabled the network to preserve subtle echogenic and structural cues crucial for distinguishing intermediate planes such as TV. Conclusion: Results demonstrate that generic foundation models fail to generalize under low inter-class variability, whereas domain-specific pretraining is essential to achieve robust and clinically reliable representations in fetal brain ultrasound imaging.