A multi-centre, multi-device benchmark dataset for landmark-based comprehensive fetal biometry

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Fetal ultrasound biometry suffers from low accuracy and poor reproducibility in anatomical landmark detection, heavily relying on operator expertise and ultrasound equipment. Method: We introduce the first publicly available, multicenter, multi-device fetal ultrasound dataset—comprising 4,513 de-identified images from 1,904 cases across three clinical centers and seven scanner models—with expert annotations for all core biometric landmarks (e.g., head circumference, abdominal circumference, femur length). We propose the first standardized annotation protocol addressing cross-center, cross-device, and multi-dimensional variability, and design a subject-isolation data split with domain-shift quantification to expose overoptimistic performance estimates from single-center evaluation. Contribution/Results: We release the dataset, annotations, training code, and evaluation pipeline. Empirical results demonstrate substantial degradation in cross-center generalization, establishing the first reproducible, robust benchmark for AI-based fetal growth assessment.

Technology Category

Application Category

📝 Abstract
Accurate fetal growth assessment from ultrasound (US) relies on precise biometry measured by manually identifying anatomical landmarks in standard planes. Manual landmarking is time-consuming, operator-dependent, and sensitive to variability across scanners and sites, limiting the reproducibility of automated approaches. There is a need for multi-source annotated datasets to develop artificial intelligence-assisted fetal growth assessment methods. To address this bottleneck, we present an open, multi-centre, multi-device benchmark dataset of fetal US images with expert anatomical landmark annotations for clinically used fetal biometric measurements. These measurements include head bi-parietal and occipito-frontal diameters, abdominal transverse and antero-posterior diameters, and femoral length. The dataset contains 4,513 de-identified US images from 1,904 subjects acquired at three clinical sites using seven different US devices. We provide standardised, subject-disjoint train/test splits, evaluation code, and baseline results to enable fair and reproducible comparison of methods. Using an automatic biometry model, we quantify domain shift and demonstrate that training and evaluation confined to a single centre substantially overestimate performance relative to multi-centre testing. To the best of our knowledge, this is the first publicly available multi-centre, multi-device, landmark-annotated dataset that covers all primary fetal biometry measures, providing a robust benchmark for domain adaptation and multi-centre generalisation in fetal biometry and enabling more reliable AI-assisted fetal growth assessment across centres. All data, annotations, training code, and evaluation pipelines are made publicly available.
Problem

Research questions and friction points this paper is trying to address.

Develop AI-assisted fetal growth assessment from ultrasound images
Address variability in manual landmarking across scanners and sites
Provide a multi-centre dataset for robust domain adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open multi-center multi-device fetal ultrasound benchmark dataset
Standardized landmark annotations for all primary fetal biometry measures
Publicly available data and code enabling reproducible domain adaptation
🔎 Similar Papers
No similar papers found.