Beyond WER: Probing Whisper's Sub-token Decoder Across Diverse Language Resource Levels

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates decoding fairness of multilingual ASR models (e.g., Whisper) across languages with varying resource levels, focusing on cross-lingual disparities in subword-level decoding mechanisms. Method: We propose a fine-grained, subword-unit-based probing framework that moves beyond aggregate error-rate analysis. Integrating beam-path tracing, predictive entropy estimation, probabilistic distribution modeling, and PCA/t-SNE visualization of subword hypothesis spaces, we analyze decoding behavior at the subword level. Contribution/Results: We find that high-resource languages exhibit high decoding confidence and broad hypothesis diversity, whereas low-resource languages—despite lower overall accuracy—display pronounced clustering in subword usage patterns, strongly correlated with linguistic typological features. These systematic, language-specific biases remain obscured in conventional metrics. Our approach establishes an interpretable paradigm for ASR fairness evaluation and informs decoding optimization strategies tailored to low-resource languages.

Technology Category

Application Category

📝 Abstract
While large multilingual automatic speech recognition (ASR) models achieve remarkable performance, the internal mechanisms of the end-to-end pipeline, particularly concerning fairness and efficacy across languages, remain underexplored. This paper introduces a fine-grained analysis of Whisper's multilingual decoder, examining its sub-token hypotheses during transcription across languages with various resource levels. Our method traces the beam search path, capturing sub-token guesses and their associated probabilities. Results reveal that higher resource languages benefit from higher likelihood of the correct token being top-ranked, greater confidence, lower predictive entropy, and more diverse alternative candidates. Lower resource languages fare worse on these metrics, but also exhibit distinct clustering patterns in sub-token usage sometimes influenced by typology in our PCA and t-SNE analysis. This sub-token probing uncovers systematic decoding disparities masked by aggregate error rates and points towards targeted interventions to ameliorate the imbalanced development of speech technology.
Problem

Research questions and friction points this paper is trying to address.

Analyzes Whisper's sub-token decoding disparities across language resource levels
Investigates systematic decoding differences masked by aggregate error rates
Examines how resource availability affects ASR confidence and token ranking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes sub-token hypotheses during transcription
Traces beam search path capturing probabilities
Uses PCA and t-SNE for clustering patterns
🔎 Similar Papers
No similar papers found.
Siyu Liang
Siyu Liang
Nanjing University of Science and Technology
stochastic analysisdeep learningpartial differential equations
N
Nicolas Ballier
ALTAE, Université Paris Cité, F-75013 Paris, France
Gina-Anne Levow
Gina-Anne Levow
University of Washington
R
Richard Wright
Department of Linguistics, University of Washington