Advanced Modeling of Interlanguage Speech Intelligibility Benefit with L1-L2 Multi-Task Learning Using Differentiable K-Means for Accent-Robust Discrete Token-Based ASR

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the performance degradation of automatic speech recognition (ASR) systems on non-native accented speech by proposing a modeling approach grounded in the Interlanguage Speech Intelligibility Benefit (ISIB) hypothesis. The method integrates multilingual multitask learning—leveraging both the speaker’s native language (L1) and the target second language (L2)—with differentiable K-means clustering in a self-supervised speech representation space to produce accent-robust discrete phoneme-like tokens for ASR training. Notably, this is the first approach to jointly optimize differentiable clustering and L1–L2 multitask learning in an end-to-end framework. The model demonstrates strong generalization: it outperforms baseline systems using only native-language data and achieves approximately a 20% relative improvement in recognition accuracy when supplemented with a small amount of accented speech data.

Technology Category

Application Category

📝 Abstract

Building ASR systems robust to foreign-accented speech is an important challenge in today's globalized world. A prior study explored the way to enhance the performance of phonetic token-based ASR on accented speech by reproducing the phenomenon known as interlanguage speech intelligibility benefit (ISIB), where foreign-accented speech is more intelligible to listeners sharing the speaker's native language than to native listeners. ISIB was technically implemented by using the speaker's L1 to learn k-means cluster centroids in an SSL feature space to obtain phonetic tokens. In this study, we propose a more advanced modeling of ISIB. By employing differentiable k-means and optimizing the entire module for both L1 and L2 ASR, the proposed method outperformed the baselines, both when using only native speech and when additionally incorporating a limited amount of accented speech. Notably, in the latter scenario, our method achieved approximately a 20% relative improvement in recognition accuracy.

Problem

Research questions and friction points this paper is trying to address.

foreign-accented speech

speech intelligibility

automatic speech recognition

accent robustness

interlanguage speech intelligibility benefit

Innovation

Methods, ideas, or system contributions that make the work stand out.

interlanguage speech intelligibility benefit

differentiable k-means

multi-task learning