FlexCTC: GPU-powered CTC Beam Decoding with advanced Contextual Abilities

📅 2025-08-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional CTC beam search relies on CPU-based sequential execution, suffering from low hardware utilization and substantial CPU–GPU synchronization overhead. This paper proposes the first fully GPU-accelerated CTC beam search decoder implemented natively in PyTorch/CUDA, supporting batched parallel decoding and end-to-end language model (LM) integration. We introduce CUDA Graphs to minimize kernel launch overhead and enable native GPU execution of N-gram LMs, phrase-level boosting, and context-aware dynamic decoding. Our approach preserves high recognition accuracy while significantly improving decoding throughput, eliminating cross-device data movement bottlenecks, and enabling industrial-scale real-time ASR deployment. The implementation is open-sourced, offering high efficiency, scalability, and usability.

Technology Category

Application Category

📝 Abstract
While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. To fully leverage modern hardware capabilities, we present a novel open-source FlexCTC toolkit for fully GPU-based beam decoding, designed for Connectionist Temporal Classification (CTC) models. Developed entirely in Python and PyTorch, it offers a fast, user-friendly, and extensible alternative to traditional C++, CUDA, or WFST-based decoders. The toolkit features a high-performance, fully batched GPU implementation with eliminated CPU-GPU synchronization and minimized kernel launch overhead via CUDA Graphs. It also supports advanced contextualization techniques, including GPU-powered N-gram language model fusion and phrase-level boosting. These features enable accurate and efficient decoding, making them suitable for both research and production use.
Problem

Research questions and friction points this paper is trying to address.

Slow CPU-bound beam search in speech recognition
Lack of GPU-optimized CTC decoding solutions
Limited contextual abilities in existing CTC decoders
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fully GPU-based beam decoding for CTC models
High-performance batched GPU with CUDA Graphs
Supports N-gram LM fusion and phrase boosting
🔎 Similar Papers
No similar papers found.