Challenging DINOv3 Foundation Model under Low Inter-Class Variability: A Case Study on Fetal Brain Ultrasound

📅 2025-11-01
📈 Citations: 0
✨ Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
Accurate identification of fetal brain standard planes (e.g., transcerebellar, transventricular, and thalamic views) in ultrasound is challenging due to extremely low inter-class discriminability. Method: We introduce FetalUS-188K—the first large-scale, multi-center benchmark dataset for fetal ultrasound—and conduct the first domain-adaptive self-supervised pretraining of a foundation model (DINOv3) specifically for this modality. We propose and validate the necessity of ultrasound-specific pretraining: compared to initialization from natural-image models, in-domain pretraining on FetalUS-188K substantially enhances representation discriminability, yielding up to a 20% improvement in weighted F1-score. Results are consistent across linear probing and full fine-tuning, confirming that domain-specific representation learning is critical for tasks with minimal inter-class variation. This work establishes a methodological paradigm and empirical benchmark for domain adaptation of foundation models in medical imaging.

Technology Category

Application Category

📝 Abstract
Purpose: This study provides the first comprehensive evaluation of foundation models in fetal ultrasound (US) imaging under low inter-class variability conditions. While recent vision foundation models such as DINOv3 have shown remarkable transferability across medical domains, their ability to discriminate anatomically similar structures has not been systematically investigated. We address this gap by focusing on fetal brain standard planes--transthalamic (TT), transventricular (TV), and transcerebellar (TC)--which exhibit highly overlapping anatomical features and pose a critical challenge for reliable biometric assessment. Methods: To ensure a fair and reproducible evaluation, all publicly available fetal ultrasound datasets were curated and aggregated into a unified multicenter benchmark, FetalUS-188K, comprising more than 188,000 annotated images from heterogeneous acquisition settings. DINOv3 was pretrained in a self-supervised manner to learn ultrasound-aware representations. The learned features were then evaluated through standardized adaptation protocols, including linear probing with frozen backbone and full fine-tuning, under two initialization schemes: (i) pretraining on FetalUS-188K and (ii) initialization from natural-image DINOv3 weights. Results: Models pretrained on fetal ultrasound data consistently outperformed those initialized on natural images, with weighted F1-score improvements of up to 20 percent. Domain-adaptive pretraining enabled the network to preserve subtle echogenic and structural cues crucial for distinguishing intermediate planes such as TV. Conclusion: Results demonstrate that generic foundation models fail to generalize under low inter-class variability, whereas domain-specific pretraining is essential to achieve robust and clinically reliable representations in fetal brain ultrasound imaging.
Problem

Research questions and friction points this paper is trying to address.

Evaluating DINOv3's ability to distinguish anatomically similar fetal brain ultrasound planes
Addressing low inter-class variability challenges in fetal ultrasound imaging
Developing domain-specific pretraining for reliable fetal brain biometric assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-adaptive pretraining on fetal ultrasound data
Self-supervised learning of ultrasound-aware representations
Multicenter benchmark with standardized adaptation protocols
🔎 Similar Papers
No similar papers found.