NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the low GPU parallelization efficiency and high industrial deployment cost of conventional n-gram models in ASR context biasing, this paper proposes the first general-purpose, low-overhead (<7%) greedy decoding framework for context biasing. Methodologically, it reformulates n-grams into a GPU-friendly compact index structure, introduces a lightweight bias integration mechanism, and achieves cross-architecture decoder compatibility with mainstream ASR models—including transducer, attention-based encoder-decoder, and CTC architectures. Experiments demonstrate that the framework bridges over 50% of the accuracy gap between greedy decoding and beam search in out-of-domain scenarios, significantly reduces latency, and maintains high model agnosticism and deployment simplicity. The implementation is open-sourced.

Technology Category

Application Category

📝 Abstract

Statistical n-gram language models are widely used for context-biasing tasks in Automatic Speech Recognition (ASR). However, existing implementations lack computational efficiency due to poor parallelization, making context-biasing less appealing for industrial use. This work rethinks data structures for statistical n-gram language models to enable fast and parallel operations for GPU-optimized inference. Our approach, named NGPU-LM, introduces customizable greedy decoding for all major ASR model types - including transducers, attention encoder-decoder models, and CTC - with less than 7% computational overhead. The proposed approach can eliminate more than 50% of the accuracy gap between greedy and beam search for out-of-domain scenarios while avoiding significant slowdown caused by beam search. The implementation of the proposed NGPU-LM is open-sourced.

Problem

Research questions and friction points this paper is trying to address.

Improves GPU-accelerated n-gram models for ASR context-biasing

Reduces computational overhead in greedy decoding for ASR

Bridges accuracy gap between greedy and beam search

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-accelerated n-gram language model

Customizable greedy decoding for ASR

Open-sourced with low computational overhead

🔎 Similar Papers

No similar papers found.