🤖 AI Summary
Current ultrasound AI systems predominantly employ single-task models, limiting their adaptability to integrated clinical workflows. Method: We propose SMART—the first cross-organ (liver, kidney, thyroid, breast, and fetal) general-purpose deep learning framework—featuring a unified encoder-decoder architecture and multi-task joint optimization to concurrently support image classification and pixel-level segmentation. Contribution/Results: SMART is the first general architecture empirically validated across multiple centers in ultrasound challenges, demonstrating feasibility of replacing fragmented task-specific models. Domain generalization emerges as the critical bottleneck for clinical deployment. On segmentation, SMART achieves a macro-average Dice score of 0.854 across five tasks, with fetal head segmentation reaching 0.942. For classification, peak AUC is 0.766; under cross-center generalization, breast cancer subtyping yields an AUC of 0.508. This work establishes a new paradigm for clinically deployable, general-purpose ultrasound AI systems.
📝 Abstract
IMPORTANCE: Current ultrasound AI remains fragmented into single-task tools, limiting clinical utility compared to versatile modern ultrasound systems.
OBJECTIVE: To evaluate the diagnostic accuracy and efficiency of single general-purpose deep learning models for multi-organ classification and segmentation.
DESIGN: The Universal UltraSound Image Challenge 2025 (UUSIC25) involved developing algorithms on 11,644 images (public/private). Evaluation used an independent, multi-center test set of 2,479 images, including data from a center completely unseen during training to assess generalization.
OUTCOMES: Diagnostic performance (Dice Similarity Coefficient [DSC]; Area Under the Receiver Operating Characteristic Curve [AUC]) and computational efficiency (inference time, GPU memory).
RESULTS: Of 15 valid algorithms, the top model (SMART) achieved a macro-averaged DSC of 0.854 across 5 segmentation tasks and AUC of 0.766 for binary classification. Models showed high capability in segmentation (e.g., fetal head DSC: 0.942) but variability in complex tasks subject to domain shift. Notably, in breast cancer molecular subtyping, the top model's performance dropped from AUC 0.571 (internal) to 0.508 (unseen external center), highlighting generalization challenges.
CONCLUSIONS: General-purpose AI models achieve high accuracy and efficiency across multiple tasks using a single architecture. However, performance degradation on unseen data suggests domain generalization is critical for future clinical deployment.