Rare Word Recognition and Translation Without Fine-Tuning via Task Vector in Speech Models

📅 2025-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing fine-tuning approaches for addressing rare-word recognition and translation in speech recognition suffer from high computational cost, catastrophic forgetting, and poor scalability. Method: This paper proposes a training-free, word-level capability editing paradigm—introducing task vectors into end-to-end speech encoder-decoder models for the first time. By representing word-level parameter differences and performing vector-space arithmetic, it enables zero-shot rare-word enhancement without parameter updates. The method supports compositional multi-task vectors, allowing dynamic adaptation to novel words at inference time. Contribution/Results: It eliminates catastrophic forgetting entirely and improves generalization. Experiments demonstrate that our approach matches or surpasses fine-tuned baselines in domain-specific rare-word recognition accuracy, while boosting overall translation BLEU by approximately 5 points. Crucially, it achieves these gains with negligible deployment overhead, substantially reducing practical implementation costs.

Technology Category

Application Category

📝 Abstract
Rare words remain a critical bottleneck for speech-to-text systems. While direct fine-tuning improves recognition of target words, it often incurs high cost, catastrophic forgetting, and limited scalability. To address these challenges, we propose a training-free paradigm based on task vectors for rare word recognition and translation. By defining task vectors as parameter differences and introducing word-level task vector arithmetic, our approach enables flexible composition of rare-word capabilities, greatly enhancing scalability and reusability. Extensive experiments across multiple domains show that the proposed method matches or surpasses fine-tuned models on target words, improves general performance by about 5 BLEU, and mitigates catastrophic forgetting.
Problem

Research questions and friction points this paper is trying to address.

Recognizes rare words in speech without fine-tuning
Enhances translation scalability via task vector arithmetic
Mitigates catastrophic forgetting while improving general performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task vector arithmetic enables rare word recognition
Training-free paradigm enhances scalability and reusability
Word-level composition matches fine-tuning without forgetting
🔎 Similar Papers
No similar papers found.
R
Ruihao Jing
Institute of Artificial Intelligence (TeleAI), China Telecom
C
Cheng Gong
Institute of Artificial Intelligence (TeleAI), China Telecom
Y
Yu Jiang
Institute of Artificial Intelligence (TeleAI), China Telecom
B
Boyu Zhu
Institute of Artificial Intelligence (TeleAI), China Telecom
Shansong Liu
Shansong Liu
TeleAI
Music AITTSLLMMulti-modal LLMAudio codec
C
Chi Zhang
Institute of Artificial Intelligence (TeleAI), China Telecom
Xiao-Lei Zhang
Xiao-Lei Zhang
Professor, Northwestern Polytechnical University, China
Speech ProcessingMachine LearningSignal Processing
X
Xuelong Li
Institute of Artificial Intelligence (TeleAI), China Telecom