Adapting Language Balance in Code-Switching Speech

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models exhibit limited performance in code-switching speech recognition, primarily due to the infrequent occurrence of switch points, weak representation of second-language embeddings, and contextual bias interfering with generation. To address this, we propose a novel paradigm that explicitly models switch points: (1) a differentiable embedding-difference proxy signal is designed as fine-grained supervision to guide the model toward language transition locations; and (2) contrastive learning is employed to enhance discriminability of infrequent switch segments. Our approach requires no additional annotations, relying solely on self-supervised signals for precise switch-point localization. Evaluated on Arabic–English and Mandarin–English code-switching benchmarks, the method significantly reduces substitution errors, improves switch-position prediction accuracy, mitigates contextual bias, and enhances robustness to low-frequency language switches.

Technology Category

Application Category

📝 Abstract
Despite achieving impressive results on standard benchmarks, large foundational models still struggle against code-switching test cases. When data scarcity cannot be used as the usual justification for poor performance, the reason may lie in the infrequent occurrence of code-switched moments, where the embedding of the second language appears subtly. Instead of expecting the models to learn this infrequency on their own, it might be beneficial to provide the training process with labels. Evaluating model performance on code-switching data requires careful localization of code-switching points where recognition errors are most consequential, so that the analysis emphasizes mistakes occurring at those moments. Building on this observation, we leverage the difference between the embedded and the main language to highlight those code-switching points and thereby emphasize learning at those locations. This simple yet effective differentiable surrogate mitigates context bias during generation -- the central challenge in code-switching -- thereby improving the model's robustness. Our experiments with Arabic and Chinese-English showed that the models are able to predict the switching places more correctly, reflected by the reduced substitution error.
Problem

Research questions and friction points this paper is trying to address.

Improving language model performance on code-switching speech
Mitigating context bias during code-switching generation
Reducing substitution errors at language switching points
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using language difference to highlight code-switching points
Applying differentiable surrogate to mitigate context bias
Emphasizing learning at code-switching locations during training
🔎 Similar Papers
No similar papers found.
E
Enes Yavuz Ugan
Interactive Systems Lab, Karlsruhe Institut of Technology (KIT), Germany
N
Ngoc-Quan Pham
InterACT, Carnegie Mellon University (CMU), USA
Alexander Waibel
Alexander Waibel
Carnegie Mellon, KIT, Karlsruhe Institute of Technology, University of Karlsruhe
Machine LearningNeural NetworksSpeech TranslationMultimodal Interfaces