TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree

📅 2025-08-09

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing context biasing methods often require additional training, incur decoding latency, or lack architectural compatibility with diverse ASR models. This paper proposes a universal, training-free shallow fusion framework compatible with mainstream ASR architectures—including CTC, Transducer, and attention-based encoder-decoder models. Its core innovation is a GPU-accelerated word boosting tree structure, enabling efficient biasing over keyword lists up to 20,000 entries while preserving native decoding speed and significantly improving recognition accuracy for critical phrases. The method supports both greedy and beam search without modification and has been integrated into the NeMo toolkit. Extensive experiments across multiple ASR systems demonstrate that our approach outperforms existing open-source biasing solutions in both accuracy and decoding efficiency, with zero computational overhead or performance degradation.

Technology Category

Application Category

📝 Abstract

Recognizing specific key phrases is an essential task for contextualized Automatic Speech Recognition (ASR). However, most existing context-biasing approaches have limitations associated with the necessity of additional model training, significantly slow down the decoding process, or constrain the choice of the ASR system type. This paper proposes a universal ASR context-biasing framework that supports all major types: CTC, Transducers, and Attention Encoder-Decoder models. The framework is based on a GPU-accelerated word boosting tree, which enables it to be used in shallow fusion mode for greedy and beam search decoding without noticeable speed degradation, even with a vast number of key phrases (up to 20K items). The obtained results showed high efficiency of the proposed method, surpassing the considered open-source context-biasing approaches in accuracy and decoding speed. Our context-biasing framework is open-sourced as a part of the NeMo toolkit.

Problem

Research questions and friction points this paper is trying to address.

Recognizing key phrases in ASR without model retraining

Avoiding decoding speed slowdown in context-biasing

Supporting multiple ASR model types universally

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-accelerated phrase-boosting tree

Universal ASR context-biasing framework

Shallow fusion mode without speed loss

🔎 Similar Papers

No similar papers found.