🤖 AI Summary
Supervised learning for automatic cystic hygroma detection in first-trimester ultrasound is hindered by severe scarcity of annotated data. Method: We propose USF-MAE, the first ultrasound-specific self-supervised representation learning framework leveraging large-scale masked autoencoding pretraining. It incorporates ultrasound-adapted masking strategies and reconstruction objectives, fine-tuned via transfer learning with a DenseNet-169 backbone; evaluation employs 4-fold cross-validation and Score-CAM for interpretability analysis. Results: USF-MAE achieves accuracy=0.96, sensitivity=0.94, specificity=0.98, and ROC-AUC=0.98—significantly outperforming supervised baselines (p=0.0057). Score-CAM visualizations confirm that the model consistently attends to clinically critical anatomical regions—specifically the fetal nuchal area—demonstrating both state-of-the-art performance and intrinsic clinical interpretability.
📝 Abstract
Cystic hygroma is a high-risk prenatal ultrasound finding that portends high rates of chromosomal abnormalities, structural malformations, and adverse pregnancy outcomes. Automated detection can increase reproducibility and support scalable early screening programs, but supervised deep learning methods are limited by small labelled datasets. This study assesses whether ultrasound-specific self-supervised pretraining can facilitate accurate, robust deep learning detection of cystic hygroma in first-trimester ultrasound images. We fine-tuned the Ultrasound Self-Supervised Foundation Model with Masked Autoencoding (USF-MAE), pretrained on over 370,000 unlabelled ultrasound images, for binary classification of normal controls and cystic hygroma cases used in this study. Performance was evaluated on the same curated ultrasound dataset, preprocessing pipeline, and 4-fold cross-validation protocol as for the DenseNet-169 baseline, using accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (ROC-AUC). Model interpretability was analyzed qualitatively using Score-CAM visualizations. USF-MAE outperformed the DenseNet-169 baseline on all evaluation metrics. The proposed model yielded a mean accuracy of 0.96, sensitivity of 0.94, specificity of 0.98, and ROC-AUC of 0.98 compared to 0.93, 0.92, 0.94, and 0.94 for the DenseNet-169 baseline, respectively. Qualitative Score-CAM visualizations of model predictions demonstrated clinical relevance by highlighting expected regions in the fetal neck for both positive and negative cases. Paired statistical analysis using a Wilcoxon signed-rank test confirmed that performance improvements achieved by USF-MAE were statistically significant (p = 0.0057).