🤖 AI Summary
This study addresses key challenges in endometrial cancer screening using transvaginal ultrasound, including low image contrast, high operator dependency, scarcity of positive cases, and limited computational resources in primary care settings. To overcome these issues, the authors propose an efficient two-stage deep learning framework. First, a structure-guided cross-modal generative network synthesizes high-fidelity ultrasound images from unpaired MRI scans to alleviate data scarcity. Subsequently, a lightweight screening network incorporating gradient-based knowledge distillation and sparse attention mechanisms dynamically focuses on task-relevant regions while preserving critical anatomical structures. Evaluated on a multicenter cohort of 7,951 cases, the model achieves 99.5% sensitivity, 97.2% specificity, and an AUC of 0.987, with a computational cost of only 0.289 GFLOPs—significantly outperforming average expert performance.
📝 Abstract
Early detection of myometrial invasion is critical for the staging and life-saving management of endometrial carcinoma (EC), a prevalent global malignancy. Transvaginal ultrasound serves as the primary, accessible screening modality in resource-constrained primary care settings; however, its diagnostic reliability is severely hindered by low tissue contrast, high operator dependence, and a pronounced scarcity of positive pathological samples. Existing artificial intelligence solutions struggle to overcome this severe class imbalance and the subtle imaging features of invasion, particularly under the strict computational limits of primary care clinics. Here we present an automated, highly efficient two-stage deep learning framework that resolves both data and computational bottlenecks in EC screening. To mitigate pathological data scarcity, we develop a structure-guided cross-modal generation network that synthesizes diverse, high-fidelity ultrasound images from unpaired magnetic resonance imaging (MRI) data, strictly preserving clinically essential anatomical junctions. Furthermore, we introduce a lightweight screening network utilizing gradient distillation, which transfers discriminative knowledge from a high-capacity teacher model to dynamically guide sparse attention towards task-critical regions. Evaluated on a large, multicenter cohort of 7,951 participants, our model achieves a sensitivity of 99.5\%, a specificity of 97.2\%, and an area under the curve of 0.987 at a minimal computational cost (0.289 GFLOPs), substantially outperforming the average diagnostic accuracy of expert sonographers. Our approach demonstrates that combining cross-modal synthetic augmentation with knowledge-driven efficient modeling can democratize expert-level, real-time cancer screening for resource-constrained primary care settings.