Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

ASR outputs often contain errors that degrade downstream task performance. To address this, we propose LIR-ASR, a human-audition-inspired iterative correction framework featuring a novel three-stage “Listen–Imagine–Refine” mechanism. First, the ASR output is parsed (“Listen”). Next, a large language model generates phoneme-level variants and explicitly models speech uncertainty (“Imagine”). Finally, context-aware global optimization is performed under linguistic constraints via finite-state-machine-guided heuristic search (“Refine”), avoiding local optima while preserving semantic consistency. Evaluated across multilingual (Chinese and English) and multi-scenario ASR post-processing tasks, LIR-ASR achieves an average reduction of 1.5 percentage points in character error rate (CER) and word error rate (WER), significantly enhancing transcription robustness and accuracy.

Technology Category

Application Category

📝 Abstract

Automatic Speech Recognition (ASR) systems remain prone to errors that affect downstream applications. In this paper, we propose LIR-ASR, a heuristic optimized iterative correction framework using LLMs, inspired by human auditory perception. LIR-ASR applies a "Listening-Imagining-Refining" strategy, generating phonetic variants and refining them in context. A heuristic optimization with finite state machine (FSM) is introduced to prevent the correction process from being trapped in local optima and rule-based constraints help maintain semantic fidelity. Experiments on both English and Chinese ASR outputs show that LIR-ASR achieves average reductions in CER/WER of up to 1.5 percentage points compared to baselines, demonstrating substantial accuracy gains in transcription.

Problem

Research questions and friction points this paper is trying to address.

Reducing ASR system errors affecting downstream applications

Optimizing iterative correction using LLMs and human auditory principles

Preventing local optima traps while maintaining semantic fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Heuristic optimized iterative correction framework

Listening-Imagining-Refining strategy with phonetic variants

Finite state machine prevents local optima traps

🔎 Similar Papers

ASR Error Correction using Large Language Models