Reference-free automatic speech severity evaluation using acoustic unit language modelling

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing speech severity assessment models suffer from poor generalizability, often overfitting to dataset-specific acoustic cues, and typically rely on reference speech or text transcriptions—limiting applicability to spontaneous, real-world speech. To address these limitations, we propose SpeechLMScore, a reference-free, pathology-agnostic assessment method grounded in acoustic unit language modeling (AULM) to learn robust, severity-discriminative representations directly from raw speech. To enable comprehensive evaluation, we introduce the NKI-SpeechRT dataset and analyze model robustness via subjective noise ratings. Experiments demonstrate that SpeechLMScore significantly outperforms conventional acoustic feature–based approaches and reference-dependent baselines under noisy conditions. Moreover, it achieves state-of-the-art performance in modeling correlations between speech naturalness and severity—without requiring ground-truth transcriptions, reference utterances, or pathological speech data.

Technology Category

Application Category

📝 Abstract
Speech severity evaluation is becoming increasingly important as the economic burden of speech disorders grows. Current speech severity models often struggle with generalization, learning dataset-specific acoustic cues rather than meaningful correlates of speech severity. Furthermore, many models require reference speech or a transcript, limiting their applicability in ecologically valid scenarios, such as spontaneous speech evaluation. Previous research indicated that automatic speech naturalness evaluation scores correlate strongly with severity evaluation scores, leading us to explore a reference-free method, SpeechLMScore, which does not rely on pathological speech data. Additionally, we present the NKI-SpeechRT dataset, based on the NKI-CCRT dataset, to provide a more comprehensive foundation for speech severity evaluation. This study evaluates whether SpeechLMScore outperforms traditional acoustic feature-based approaches and assesses the performance gap between reference-free and reference-based models. Moreover, we examine the impact of noise on these models by utilizing subjective noise ratings in the NKI-SpeechRT dataset. The results demonstrate that SpeechLMScore is robust to noise and offers superior performance compared to traditional approaches.
Problem

Research questions and friction points this paper is trying to address.

Developing reference-free speech severity evaluation without pathological data
Addressing generalization issues in current speech severity assessment models
Evaluating noise robustness in automatic speech severity evaluation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reference-free speech severity evaluation method
Acoustic unit language modeling approach
Robust noise-resistant performance
🔎 Similar Papers
No similar papers found.
B
Bence Mark Halpern
Nagoya University, Nagoya, Japan
Tomoki Toda
Tomoki Toda
Nagoya University
Signal ProcessingSpeech ProcessingSpeech Synthesis