OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address the low accuracy of speech foundation models (SFM) on rare or unseen words, this paper proposes a fine-tuning-free, lightweight contextual biasing (CB) method—first integrating a dynamic vocabulary guidance mechanism into the frozen-parameter Open Whisper-Style Speech Model (OWSM v3.1). The method enables few-shot, training-free domain adaptation within an end-to-end encoder-decoder architecture via real-time lexical bias injection and attention reweighting. Its core innovation lies in leveraging pretrained knowledge while supporting dynamic, on-the-fly vocabulary customization—thus balancing generalization and flexibility. Experiments on LibriSpeech test-clean demonstrate significant improvements: biased word error rate (B-WER) decreases by 11.6 percentage points, overall WER drops by 0.9 points, and inference real-time factor improves by 7.5%.

Technology Category

Application Category

📝 Abstract

Speech foundation models (SFMs), such as Open Whisper-Style Speech Models (OWSM), are trained on massive datasets to achieve accurate automatic speech recognition. However, even SFMs struggle to accurately recognize rare and unseen words. While contextual biasing (CB) is a promising approach to improve recognition of such words, most CB methods are trained from scratch, resulting in lower performance than SFMs due to the lack of pre-trained knowledge. This paper integrates an existing CB method with OWSM v3.1 while freezing its pre-trained parameters. By leveraging the knowledge embedded in SFMs, the proposed method enables effective CB while preserving the advantages of SFMs, even with a small dataset. Experimental results show that the proposed method improves the biasing word error rate (B-WER) by 11.6 points, resulting in a 0.9 point improvement in the overall WER while reducing the real-time factor by 7.5% compared to the non-biasing baseline on the LibriSpeech 100 test-clean set.

Problem

Research questions and friction points this paper is trying to address.

Improving rare word recognition in speech models

Integrating contextual biasing with pre-trained SFMs

Enhancing accuracy without retraining SFM parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates contextual biasing with OWSM v3.1

Freezes pre-trained parameters for efficiency

Improves biasing word error rate significantly

🔎 Similar Papers

No similar papers found.