Continuously Learning New Words in Automatic Speech Recognition

📅 2024-01-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

To address the performance degradation of automatic speech recognition (ASR) systems in recognizing acronyms, proper nouns, and domain-specific neologisms—largely due to scarce labeled training data—this paper proposes a self-supervised continual learning framework that jointly leverages presentation audio and corresponding slides. We introduce the novel use of publicly available slides as unsupervised semantic anchors, enabling memory-augmented ASR models to perform audio-slide self-supervised alignment, thereby facilitating neologism discovery and robust decoding. Subsequently, high-quality pseudo-labels are iteratively generated and used to incrementally train lightweight adapter weights. The method requires no manual annotation, achieving strong neologism recall (>80% in high-frequency scenarios) while preserving general ASR accuracy and model generalizability. Experimental results demonstrate significant improvements in domain adaptation efficiency without compromising cross-domain robustness.

Technology Category

Application Category

📝 Abstract

Despite recent advances, Automatic Speech Recognition (ASR) systems are still far from perfect. Typical errors include acronyms, named entities, and domain-specific special words for which little or no labeled data is available. To address the problem of recognizing these words, we propose a self-supervised continual learning approach: Given the audio of a lecture talk with the corresponding slides, we bias the model towards decoding new words from the slides by using a memory-enhanced ASR model from the literature. Then, we perform inference on the talk, collecting utterances that contain detected new words into an adaptation data set. Continual learning is then performed by training adaptation weights added to the model on this data set. The whole procedure is iterated for many talks. We show that with this approach, we obtain increasing performance on the new words when they occur more frequently (more than 80% recall) while preserving the general performance of the model.

Problem

Research questions and friction points this paper is trying to address.

Automatic Speech Recognition

Novel Word Recognition

Data Sparsity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised Learning

Continuous Learning

Speech Recognition

🔎 Similar Papers

No similar papers found.