Pushing the Limits of Beam Search Decoding for Transducer-based ASR models

๐Ÿ“… 2025-05-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Transducer models achieve state-of-the-art performance in end-to-end automatic speech recognition (ASR), but standard beam search decoding severely impedes inference latency. This work proposes the first general-purpose beam search acceleration framework for Transducer, unifying two efficient algorithmsโ€”ALSD++ and AES++. Key contributions include: (1) a tree-structured hypothesis representation enabling compact encoder-decoder state management; (2) an improved blank token scoring mechanism to enhance shallow fusion effectiveness; and (3) end-to-end GPU optimization via CUDA Graph integration and batched tensor operations. Experiments demonstrate that the accelerated beam search attains 80โ€“90% of greedy decoding speed while reducing word error rate (WER) by 14โ€“30% relative to greedy decoding. In low-resource settings, shallow fusion yields up to 11% WER improvement. The complete implementation is open-sourced.

Technology Category

Application Category

๐Ÿ“ Abstract
Transducer models have emerged as a promising choice for end-to-end ASR systems, offering a balanced trade-off between recognition accuracy, streaming capabilities, and inference speed in greedy decoding. However, beam search significantly slows down Transducers due to repeated evaluations of key network components, limiting practical applications. This paper introduces a universal method to accelerate beam search for Transducers, enabling the implementation of two optimized algorithms: ALSD++ and AES++. The proposed method utilizes batch operations, a tree-based hypothesis structure, novel blank scoring for enhanced shallow fusion, and CUDA graph execution for efficient GPU inference. This narrows the speed gap between beam and greedy modes to only 10-20% for the whole system, achieves 14-30% relative improvement in WER compared to greedy decoding, and improves shallow fusion for low-resource up to 11% compared to existing implementations. All the algorithms are open sourced.
Problem

Research questions and friction points this paper is trying to address.

Accelerates beam search for Transducer ASR models
Reduces speed gap between beam and greedy decoding
Improves word error rate and shallow fusion performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Batch operations and tree-based hypothesis structure
Novel blank scoring for enhanced shallow fusion
CUDA graph execution for efficient GPU inference
๐Ÿ”Ž Similar Papers
No similar papers found.