Pushing the Limits of Beam Search Decoding for Transducer-based ASR models

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Transducer models achieve state-of-the-art performance in end-to-end automatic speech recognition (ASR), but standard beam search decoding severely impedes inference latency. This work proposes the first general-purpose beam search acceleration framework for Transducer, unifying two efficient algorithms—ALSD++ and AES++. Key contributions include: (1) a tree-structured hypothesis representation enabling compact encoder-decoder state management; (2) an improved blank token scoring mechanism to enhance shallow fusion effectiveness; and (3) end-to-end GPU optimization via CUDA Graph integration and batched tensor operations. Experiments demonstrate that the accelerated beam search attains 80–90% of greedy decoding speed while reducing word error rate (WER) by 14–30% relative to greedy decoding. In low-resource settings, shallow fusion yields up to 11% WER improvement. The complete implementation is open-sourced.

Technology Category

Application Category

📝 Abstract

Transducer models have emerged as a promising choice for end-to-end ASR systems, offering a balanced trade-off between recognition accuracy, streaming capabilities, and inference speed in greedy decoding. However, beam search significantly slows down Transducers due to repeated evaluations of key network components, limiting practical applications. This paper introduces a universal method to accelerate beam search for Transducers, enabling the implementation of two optimized algorithms: ALSD++ and AES++. The proposed method utilizes batch operations, a tree-based hypothesis structure, novel blank scoring for enhanced shallow fusion, and CUDA graph execution for efficient GPU inference. This narrows the speed gap between beam and greedy modes to only 10-20% for the whole system, achieves 14-30% relative improvement in WER compared to greedy decoding, and improves shallow fusion for low-resource up to 11% compared to existing implementations. All the algorithms are open sourced.

Problem

Research questions and friction points this paper is trying to address.

Accelerates beam search for Transducer ASR models

Reduces speed gap between beam and greedy decoding

Improves word error rate and shallow fusion performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Batch operations and tree-based hypothesis structure

Novel blank scoring for enhanced shallow fusion

CUDA graph execution for efficient GPU inference

🔎 Similar Papers

Joint Beam Search Integrating CTC, Attention, and Transducer Decoders