Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In goal-oriented dialogues, generic ASR systems exhibit weak error correction capability under user-agnostic conditions with high linguistic flexibility (e.g., lexical/syntactic variation). To address this, we propose a context-aware real-time correction method. Our core contribution is a novel bidirectional context enhancement mechanism that jointly incorporates dialogue state and task semantics: (1) semantic similarity–based rescoring of n-best ASR hypotheses, and (2) dynamic context retrieval and selection via speech-text alignment—without relying on named entity recognition or historical data. The method integrates LLM-powered context enhancement, semantic and lexical matching, and dialogue-state-driven dynamic ranking. Evaluated on real-world home repair and cooking tasks, it achieves a 34% improvement in correction recall and a 16% gain in F1 score, while maintaining stable precision and false positive rate. User satisfaction increases significantly by 0.8–1.0 points (on a 5-point scale).

Technology Category

Application Category

📝 Abstract
General-purpose automatic speech recognition (ASR) systems do not always perform well in goal-oriented dialogue. Existing ASR correction methods rely on prior user data or named entities. We extend correction to tasks that have no prior user data and exhibit linguistic flexibility such as lexical and syntactic variations. We propose a novel context augmentation with a large language model and a ranking strategy that incorporates contextual information from the dialogue states of a goal-oriented conversational AI and its tasks. Our method ranks (1) n-best ASR hypotheses by their lexical and semantic similarity with context and (2) context by phonetic correspondence with ASR hypotheses. Evaluated in home improvement and cooking domains with real-world users, our method improves recall and F1 of correction by 34% and 16%, respectively, while maintaining precision and false positive rate. Users rated .8-1 point (out of 5) higher when our correction method worked properly, with no decrease due to false positives.
Problem

Research questions and friction points this paper is trying to address.

Speech Recognition Errors
Targeted Dialogue
Language Variation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Ranking Strategy
Speech Recognition Error Correction