🤖 AI Summary
ASR outputs frequently contain errors requiring manual post-editing; existing LLM-based full-rewrite approaches suffer from high redundancy and low efficiency, while compact edit representations lack sufficient contextual modeling and accuracy. This paper proposes CEGER, a Context-Enhanced Granular Edit Representation framework that formalizes ASR error correction as a structured edit instruction generation task. CEGER jointly incorporates local error context and global semantic constraints, and employs a deterministic decoding module to ensure lossless text reconstruction. By generating minimal, semantically grounded edit operations rather than full sequences, CEGER significantly reduces redundant token generation. On LibriSpeech, it achieves a 12.3% relative WER reduction over full-rewrite baselines and outperforms prior compact edit methods, striking an optimal balance between correction accuracy and computational efficiency.
📝 Abstract
Despite ASR technology being full-scale adopted by industry and for large portions of the population, ASR systems often have errors that require editors to post-edit text quality. While LLMs are powerful post-editing tools, baseline full rewrite models have inference inefficiencies because they often generate the same redundant text over and over again. Compact edit representations have existed but often lack the efficacy and context required for optimal accuracy. This paper introduces CEGER (Context-Enhanced Granular Edit Representation), a compact edit representation that was generated for highly accurate, efficient ASR post-editing. CEGER allows LLMs to generate a sequence of structured, fine-grained, contextually rich commands to modify the original ASR output. A separate expansion module deterministically reconstructs the corrected text based on the commands. Extensive experiments on the LibriSpeech dataset that were conducted, CEGER achieves state-of-the-art accuracy, achieving the lowest word error rate (WER) versus full rewrite and prior compact representations.