🤖 AI Summary
Automatic assessment of stuttering severity suffers from low annotation quality and poor model reliability due to insufficient integration of clinical expertise. Method: We introduce the first clinical-grade multimodal stuttering annotation resource, built upon FluencyBank and meticulously annotated by speech-language pathologists using authoritative clinical standards—covering stuttering events, secondary behaviors, and multidimensional tension levels in audiovisual recordings. A high-reliability test set was established via expert consensus. We propose a novel “clinical-consensus-driven multidimensional tension scoring + secondary behavior” joint annotation framework to ensure deep alignment with real-world clinical practice. Contribution/Results: The resulting dataset empirically demonstrates the task’s strong dependence on domain-specific clinical knowledge, establishing a new benchmark for model training, interpretability analysis, and fair, clinically grounded evaluation of stuttering severity assessment systems.
📝 Abstract
Stuttering is a complex disorder that requires specialized expertise for effective assessment and treatment. This paper presents an effort to enhance the FluencyBank dataset with a new stuttering annotation scheme based on established clinical standards. To achieve high-quality annotations, we hired expert clinicians to label the data, ensuring that the resulting annotations mirror real-world clinical expertise. The annotations are multi-modal, incorporating audiovisual features for the detection and classification of stuttering moments, secondary behaviors, and tension scores. In addition to individual annotations, we additionally provide a test set with highly reliable annotations based on expert consensus for assessing individual annotators and machine learning models. Our experiments and analysis illustrate the complexity of this task that necessitates extensive clinical expertise for valid training and evaluation of stuttering assessment models.