Evaluating ASR robustness to spontaneous speech errors: A study of WhisperX using a Speech Error Database

📅 2025-08-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the robustness evaluation of automatic speech recognition (ASR) systems against spontaneous speech errors. We introduce SFUSED—the first English spontaneous speech error corpus with multi-level linguistic annotations (word- and syllable-level error localization, context sensitivity, correction patterns), comprising 5,300 utterances—and employ its structured error annotation schema for ASR diagnostics, conducting a systematic evaluation of WhisperX. Methodologically, we integrate degraded word detection, error type classification, and correction behavior modeling to enable fine-grained error attribution. Results demonstrate that SFUSED effectively exposes ASR vulnerabilities in realistic conversational contexts: WhisperX exhibits strong robustness against repetitions and fillers but shows significant limitations in handling phonological illusions and context-dependent corrections. This work establishes a reproducible benchmark framework and diagnostic paradigm for ASR robustness assessment.

Technology Category

Application Category

📝 Abstract
The Simon Fraser University Speech Error Database (SFUSED) is a public data collection developed for linguistic and psycholinguistic research. Here we demonstrate how its design and annotations can be used to test and evaluate speech recognition models. The database comprises systematically annotated speech errors from spontaneous English speech, with each error tagged for intended and actual error productions. The annotation schema incorporates multiple classificatory dimensions that are of some value to model assessment, including linguistic hierarchical level, contextual sensitivity, degraded words, word corrections, and both word-level and syllable-level error positioning. To assess the value of these classificatory variables, we evaluated the transcription accuracy of WhisperX across 5,300 documented word and phonological errors. This analysis demonstrates the atabase's effectiveness as a diagnostic tool for ASR system performance.
Problem

Research questions and friction points this paper is trying to address.

Assessing ASR robustness to spontaneous speech errors
Evaluating WhisperX using annotated speech error data
Testing ASR performance with linguistic error classifications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using SFUSED to test ASR robustness
Evaluating WhisperX on annotated speech errors
Assessing ASR performance with multi-dimensional annotations
🔎 Similar Papers
No similar papers found.
J
John Alderete
Linguistics and Cognitive Science, Simon Fraser University, Canada
M
Macarius Kin Fung Hui
Khoury College of Computer Sciences, Northeastern University, Vancouver, BC, Canada
Aanchan Mohan
Aanchan Mohan
Northeastern University
Speech RecognitionMachine LearningAcoustic ModellingSpeaker VerificationMulti-lingual Speech Recognition