Efficient Trie-based Biasing using K-step Prediction for Rare Word Recognition

📅 2025-09-11

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Existing trie-based biasing methods for ASR face two key bottlenecks in recognizing rare words: reliance on beam search and computationally expensive score rollback mechanisms—e.g., pre-scoring partial matches like “Bon” only to revoke the score later if the full word “Bonham” is not generated. This work proposes a K-step lookahead prediction–enhanced trie biasing method, which predicts K subsequent decoding steps prior to actual decoding, thereby eliminating score rollbacks entirely and significantly reducing decoding complexity—especially beneficial for large-parameter models. Built upon the Whisper architecture, the method requires only 10 hours of synthetic data for fine-tuning and integrates trie-based context-aware biasing efficiently. On the NSC Part 2 test set, it reduces word error rate from 30.86% to 12.19%, demonstrating substantial improvements in both accuracy and inference efficiency.

Technology Category

Application Category

📝 Abstract

Contextual biasing improves rare word recognition of ASR models by prioritizing the output of rare words during decoding. A common approach is Trie-based biasing, which gives "bonus scores" to partial hypothesis (e.g. "Bon") that may lead to the generation of the rare word (e.g. "Bonham"). If the full word ("Bonham") isn't ultimately recognized, the system revokes those earlier bonuses. This revocation is limited to beam search and is computationally expensive, particularly for models with large decoders. To overcome these limitations, we propose adapting ASR models to look ahead and predict multiple steps at once. This avoids the revocation step entirely by better estimating whether a partial hypothesis will lead to the generation of the full rare word. By fine-tuning Whisper with only 10 hours of synthetic data, our method reduces the word error rate on the NSC Part 2 test set from 30.86% to 12.19%.

Problem

Research questions and friction points this paper is trying to address.

Improving rare word recognition in ASR models

Eliminating computationally expensive bonus revocation

Enhancing prediction accuracy for partial hypotheses

Innovation

Methods, ideas, or system contributions that make the work stand out.

K-step prediction for rare words

Fine-tuning Whisper with synthetic data

Avoids revocation step in beam search

🔎 Similar Papers

LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR