Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
ASR outputs often contain errors that degrade downstream task performance. To address this, we propose LIR-ASR, a human-audition-inspired iterative correction framework featuring a novel three-stage “Listen–Imagine–Refine” mechanism. First, the ASR output is parsed (“Listen”). Next, a large language model generates phoneme-level variants and explicitly models speech uncertainty (“Imagine”). Finally, context-aware global optimization is performed under linguistic constraints via finite-state-machine-guided heuristic search (“Refine”), avoiding local optima while preserving semantic consistency. Evaluated across multilingual (Chinese and English) and multi-scenario ASR post-processing tasks, LIR-ASR achieves an average reduction of 1.5 percentage points in character error rate (CER) and word error rate (WER), significantly enhancing transcription robustness and accuracy.

Technology Category

Application Category

📝 Abstract
Automatic Speech Recognition (ASR) systems remain prone to errors that affect downstream applications. In this paper, we propose LIR-ASR, a heuristic optimized iterative correction framework using LLMs, inspired by human auditory perception. LIR-ASR applies a "Listening-Imagining-Refining" strategy, generating phonetic variants and refining them in context. A heuristic optimization with finite state machine (FSM) is introduced to prevent the correction process from being trapped in local optima and rule-based constraints help maintain semantic fidelity. Experiments on both English and Chinese ASR outputs show that LIR-ASR achieves average reductions in CER/WER of up to 1.5 percentage points compared to baselines, demonstrating substantial accuracy gains in transcription.
Problem

Research questions and friction points this paper is trying to address.

Reducing ASR system errors affecting downstream applications
Optimizing iterative correction using LLMs and human auditory principles
Preventing local optima traps while maintaining semantic fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heuristic optimized iterative correction framework
Listening-Imagining-Refining strategy with phonetic variants
Finite state machine prevents local optima traps
🔎 Similar Papers
No similar papers found.
Y
Yutong Liu
School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China.
Z
Ziyue Zhang
School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China.
Yongbin Yu
Yongbin Yu
University of Electronic Science and Technology of China
Memristor、Neural Network、Natural Language Processing、Impulsive Control、Swarm Intelligence、EDA、MBSE
Xiangxiang Wang
Xiangxiang Wang
University of Electronic Science and Technology of China
neural networkstime scalesnonlinear systemsimpulsive control
Y
Yuqing Cai
School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, China.
N
Nyima Tashi
School of Information Science and Technology, Tibet University, Lhasa, China.