LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR

📅 2024-09-20

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the challenges of recognizing rare and out-of-vocabulary words—particularly named entities—in end-to-end automatic speech recognition (ASR), as well as enabling rapid domain adaptation using text-only data, this paper proposes a lightweight dynamic keyword biasing method. The approach innovatively embeds a word-level n-gram language model into an Aho-Corasick automaton to construct an updatable unified context graph, which is integrated via shallow fusion with a Transducer decoder for real-time dynamic biasing. Crucially, the method preserves decoding latency while improving accuracy and robustness—avoiding the performance degradation commonly observed with conventional biasing techniques. Experiments across four languages and three public benchmarks demonstrate up to a 21.6% relative reduction in word error rate, confirming its cross-lingual and cross-domain generalizability.

Technology Category

Application Category

📝 Abstract

Despite the recent success of end-to-end models for automatic speech recognition, recognizing special rare and out-of-vocabulary words, as well as fast domain adaptation with text, are still challenging. It often happens that biasing to the special entities leads to a degradation in the overall performance. We propose a light on-the-fly method to improve automatic speech recognition performance by combining a bias list of named entities with a word-level n-gram language model with the shallow fusion approach based on the Aho-Corasick string matching algorithm. The Aho-Corasick algorithm has proved to be more efficient than other methods and allows fast context adaptation. An n-gram language model is introduced as a graph with fail and output arcs, where the arc weights are adapted from the n-gram probabilities. The language model is used as an additional support to keyword biasing when the language model is combined with bias entities in a single context graph to take care of the overall performance. We demonstrate our findings on 4 languages, 2 public and 1 private datasets including performance on named entities and out-of-vocabulary entities. We achieve up to 21.6% relative improvement in the general word error rate with no practical difference in the inverse real-time factor.

Problem

Research questions and friction points this paper is trying to address.

Recognizing rare out-of-vocabulary words in ASR

Adapting ASR to new domains using text data

Integrating biasing methods without computational overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines NE bias list and n-gram LM

Integrates biasing into transducer ASR

Improves entity recognition and reduces WER

🔎 Similar Papers

No similar papers found.