🤖 AI Summary
This study addresses the inefficiency and time-consuming nature of manual measurements of geniohyoid muscle morphology in speech-related ultrasound images, which hinders large-scale research. To overcome this limitation, the authors propose the SMMA framework—the first fully automated method for high-precision quantification of muscle thickness. By integrating deep learning–based semantic segmentation with skeleton-based thickness computation, the framework enables dynamic analysis of morphological changes during speech production. Evaluated on a Cantonese vowel task, the method achieves near-expert performance (Dice = 0.9037, MAE = 0.53 mm, r = 0.901), confirming that the geniohyoid muscle is significantly thicker during /a:/ than /i:/ production and that male participants exhibit 5–8% greater thickness than females. This work establishes a scalable technical foundation for objective assessment of speech motor control and swallowing disorders.
📝 Abstract
Manual measurement of muscle morphology from ultrasound during speech is time-consuming and limits large-scale studies. We present SMMA, a fully automated framework that combines deep-learning segmentation with skeleton-based thickness quantification to analyze geniohyoid (GH) muscle dynamics. Validation demonstrates near-human-level accuracy (Dice = 0.9037, MAE = 0.53 mm, r = 0.901). Application to Cantonese vowel production (N = 11) reveals systematic patterns: /a:/ shows significantly greater GH thickness (7.29 mm) than /i:/ (5.95 mm, p < 0.001, Cohen's d > 1.3), suggesting greater GH activation during production of /a:/ than /i:/, consistent with its role in mandibular depression. Sex differences (5-8% greater in males) reflect anatomical scaling. SMMA achieves expert-validated accuracy while eliminating the need for manual annotation, enabling scalable investigations of speech motor control and objective assessment of speech and swallowing disorders.