Diagnostic Performance of Universal-Learning Ultrasound AI Across Multiple Organs and Tasks: the UUSIC25 Challenge

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Current ultrasound AI systems predominantly employ single-task models, limiting their adaptability to integrated clinical workflows. Method: We propose SMART—the first cross-organ (liver, kidney, thyroid, breast, and fetal) general-purpose deep learning framework—featuring a unified encoder-decoder architecture and multi-task joint optimization to concurrently support image classification and pixel-level segmentation. Contribution/Results: SMART is the first general architecture empirically validated across multiple centers in ultrasound challenges, demonstrating feasibility of replacing fragmented task-specific models. Domain generalization emerges as the critical bottleneck for clinical deployment. On segmentation, SMART achieves a macro-average Dice score of 0.854 across five tasks, with fetal head segmentation reaching 0.942. For classification, peak AUC is 0.766; under cross-center generalization, breast cancer subtyping yields an AUC of 0.508. This work establishes a new paradigm for clinically deployable, general-purpose ultrasound AI systems.

Technology Category

Application Category

📝 Abstract

IMPORTANCE: Current ultrasound AI remains fragmented into single-task tools, limiting clinical utility compared to versatile modern ultrasound systems. OBJECTIVE: To evaluate the diagnostic accuracy and efficiency of single general-purpose deep learning models for multi-organ classification and segmentation. DESIGN: The Universal UltraSound Image Challenge 2025 (UUSIC25) involved developing algorithms on 11,644 images (public/private). Evaluation used an independent, multi-center test set of 2,479 images, including data from a center completely unseen during training to assess generalization. OUTCOMES: Diagnostic performance (Dice Similarity Coefficient [DSC]; Area Under the Receiver Operating Characteristic Curve [AUC]) and computational efficiency (inference time, GPU memory). RESULTS: Of 15 valid algorithms, the top model (SMART) achieved a macro-averaged DSC of 0.854 across 5 segmentation tasks and AUC of 0.766 for binary classification. Models showed high capability in segmentation (e.g., fetal head DSC: 0.942) but variability in complex tasks subject to domain shift. Notably, in breast cancer molecular subtyping, the top model's performance dropped from AUC 0.571 (internal) to 0.508 (unseen external center), highlighting generalization challenges. CONCLUSIONS: General-purpose AI models achieve high accuracy and efficiency across multiple tasks using a single architecture. However, performance degradation on unseen data suggests domain generalization is critical for future clinical deployment.

Problem

Research questions and friction points this paper is trying to address.

Evaluating general-purpose AI models for multi-organ ultrasound classification and segmentation

Assessing diagnostic accuracy and computational efficiency across diverse ultrasound tasks

Investigating domain generalization challenges in AI models for clinical deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single deep learning model for multi-organ tasks

Evaluated on independent multi-center test dataset

Highlights domain generalization challenges for deployment

🔎 Similar Papers

No similar papers found.