Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores

📅 2024-06-06

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

166K/year

🤖 AI Summary

To address strong cross-lingual interference and difficulty in modeling bilingual code-switching in zero-shot Mandarin–English mixed-language ASR, this paper proposes a kNN-CTC framework featuring dual monolingual datastores and a gating selection mechanism. Our method dynamically routes each frame to the language-specific datastore at the frame level, enabling language-aware contextual enhancement; it further employs monolingual embedding indexing with CTC alignment constraints to eliminate reliance on mixed-language data. Crucially, we depart from conventional bilingual-aligned datastore designs by decoupling language representations via the gating mechanism, thereby significantly mitigating cross-lingual interference. Experiments demonstrate that our approach substantially outperforms baselines on zero-shot mixed-language ASR—without requiring any Mandarin–English mixed-language training data—establishing a novel paradigm for low-resource bilingual speech recognition.

Technology Category

Application Category

📝 Abstract

The kNN-CTC model has proven to be effective for monolingual automatic speech recognition (ASR). However, its direct application to multilingual scenarios like code-switching, presents challenges. Although there is potential for performance improvement, a kNN-CTC model utilizing a single bilingual datastore can inadvertently introduce undesirable noise from the alternative language. To address this, we propose a novel kNN-CTC-based code-switching ASR (CS-ASR) framework that employs dual monolingual datastores and a gated datastore selection mechanism to reduce noise interference. Our method selects the appropriate datastore for decoding each frame, ensuring the injection of language-specific information into the ASR process. We apply this framework to cutting-edge CTC-based models, developing an advanced CS-ASR system. Extensive experiments demonstrate the remarkable effectiveness of our gated datastore mechanism in enhancing the performance of zero-shot Chinese-English CS-ASR.

Problem

Research questions and friction points this paper is trying to address.

Bilingual Switching

Speech Recognition

Noise Reduction

Innovation

Methods, ideas, or system contributions that make the work stand out.

kNN-CTC System

Bilingual Switching Speech Recognition

Adaptive Database Selector

🔎 Similar Papers

Cross-Lingual Transfer Learning for Speech Translation