UltraUNet: Real-Time Ultrasound Tongue Segmentation for Diverse Linguistic and Imaging Conditions

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of ultrasound tongue image segmentation—namely low signal-to-noise ratio, high imaging variability, and stringent real-time requirements—this paper proposes a lightweight encoder-decoder network. The architecture incorporates a light-weight Squeeze-and-Excitation module, group normalization, and summation-based skip connections to significantly reduce computational and memory overhead while improving training stability under small batch sizes. Additionally, ultrasound-specific denoising and blur augmentation strategies are integrated to enhance model robustness and cross-domain generalization. Evaluated on eight heterogeneous datasets, the method achieves a per-dataset Dice score of 0.855 and an average cross-dataset Dice of 0.734, with inference speed reaching 250 FPS. These results demonstrate a favorable trade-off among accuracy, generalizability, and real-time performance, effectively supporting multilingual speech research and clinical tongue diagnosis applications.

Technology Category

Application Category

📝 Abstract
Ultrasound tongue imaging (UTI) is a non-invasive and cost-effective tool for studying speech articulation, motor control, and related disorders. However, real-time tongue contour segmentation remains challenging due to low signal-to-noise ratios, imaging variability, and computational demands. We propose UltraUNet, a lightweight encoder-decoder architecture optimized for real-time segmentation of tongue contours in ultrasound images. UltraUNet incorporates domain-specific innovations such as lightweight Squeeze-and-Excitation blocks, Group Normalization for small-batch stability, and summation-based skip connections to reduce memory and computational overhead. It achieves 250 frames per second and integrates ultrasound-specific augmentations like denoising and blur simulation. Evaluations on 8 datasets demonstrate high accuracy and robustness, with single-dataset Dice = 0.855 and MSD = 0.993px, and cross-dataset Dice averaging 0.734 and 0.761. UltraUNet provides a fast, accurate solution for speech research, clinical diagnostics, and analysis of speech motor disorders.
Problem

Research questions and friction points this paper is trying to address.

Real-time tongue contour segmentation in ultrasound images
Overcoming low signal-to-noise ratios and imaging variability
Reducing computational demands for speech research applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight encoder-decoder architecture for real-time segmentation
Domain-specific blocks and normalization for computational efficiency
Ultrasound-specific augmentations enhancing accuracy and robustness
🔎 Similar Papers
No similar papers found.
A
Alisher Myrgyyassov
Biomedical Engineering Department, Hong Kong Polytechnic University, Hong Kong, China
Zhen Song
Zhen Song
Siemens Corporation, Corporate Technology
Building automationbuilding enegy managmentoptimal controlroboticsoptimization
Y
Yu Sun
Biomedical Engineering Department, Hong Kong Polytechnic University, Hong Kong, China
Bruce Xiao Wang
Bruce Xiao Wang
Department of English and Communication, Hong Kong Polytechnic University
forensic phoneticslikelihood ratiouncertaintystatisticsspeech prosody
M
Min Ney Wong
Department of Chinese and Bilingual Studies, Hong Kong Polytechnic University, Hong Kong, China
Y
Yongping Zheng
Department of Biomedical Engineering, Research Institute for Smart Ageing, Hong Kong Polytechnic University, Hong Kong, China