🤖 AI Summary
Existing fine-tuning approaches for addressing rare-word recognition and translation in speech recognition suffer from high computational cost, catastrophic forgetting, and poor scalability.
Method: This paper proposes a training-free, word-level capability editing paradigm—introducing task vectors into end-to-end speech encoder-decoder models for the first time. By representing word-level parameter differences and performing vector-space arithmetic, it enables zero-shot rare-word enhancement without parameter updates. The method supports compositional multi-task vectors, allowing dynamic adaptation to novel words at inference time.
Contribution/Results: It eliminates catastrophic forgetting entirely and improves generalization. Experiments demonstrate that our approach matches or surpasses fine-tuned baselines in domain-specific rare-word recognition accuracy, while boosting overall translation BLEU by approximately 5 points. Crucially, it achieves these gains with negligible deployment overhead, substantially reducing practical implementation costs.
📝 Abstract
Rare words remain a critical bottleneck for speech-to-text systems. While direct fine-tuning improves recognition of target words, it often incurs high cost, catastrophic forgetting, and limited scalability. To address these challenges, we propose a training-free paradigm based on task vectors for rare word recognition and translation. By defining task vectors as parameter differences and introducing word-level task vector arithmetic, our approach enables flexible composition of rare-word capabilities, greatly enhancing scalability and reusability. Extensive experiments across multiple domains show that the proposed method matches or surpasses fine-tuned models on target words, improves general performance by about 5 BLEU, and mitigates catastrophic forgetting.