A Neural Model for Contextual Biasing Score Learning and Filtering

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of effectively integrating external knowledge—such as user-specific phrases or entities—into the decoding process of personalized automatic speech recognition (ASR). We propose an attention-bias-based candidate phrase filtering framework that dynamically scores candidate phrase tokens using ASR encoder outputs, and jointly leverages shallow fusion with a discriminative token-level loss to precisely suppress low-confidence distractors. The framework is modular and plug-and-play, compatible with any encoder-decoder ASR architecture. Evaluated on the LibriSpeech biasing benchmark, it achieves over 90% candidate redundancy reduction while substantially improving word error rate (WER) across diverse biasing strengths. Our approach establishes a general, efficient, and interpretable paradigm for knowledge-guided ASR biasing, offering both robust performance gains and transparent control over external knowledge integration.

Technology Category

Application Category

📝 Abstract
Contextual biasing improves automatic speech recognition (ASR) by integrating external knowledge, such as user-specific phrases or entities, during decoding. In this work, we use an attention-based biasing decoder to produce scores for candidate phrases based on acoustic information extracted by an ASR encoder, which can be used to filter out unlikely phrases and to calculate bonus for shallow-fusion biasing. We introduce a per-token discriminative objective that encourages higher scores for ground-truth phrases while suppressing distractors. Experiments on the Librispeech biasing benchmark show that our method effectively filters out majority of the candidate phrases, and significantly improves recognition accuracy under different biasing conditions when the scores are used in shallow fusion biasing. Our approach is modular and can be used with any ASR system, and the filtering mechanism can potentially boost performance of other biasing methods.
Problem

Research questions and friction points this paper is trying to address.

Improves speech recognition by integrating external knowledge during decoding
Filters unlikely phrases using attention-based scores from acoustic information
Enhances recognition accuracy across different biasing conditions modularly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-based decoder scores candidate phrases acoustically
Per-token discriminative objective suppresses distractors
Modular filtering mechanism works with any ASR system
🔎 Similar Papers
No similar papers found.
W
Wanting Huang
Computer Science Department, University of Iowa
Weiran Wang
Weiran Wang
University of Iowa
Machine learningspeech processing