LLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic Context

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing ASR post-processing methods suffer from poor correction performance on rare and domain-specific words, often leading to over-correction. Method: This paper proposes a generative LLM-based correction framework that integrates synthetically generated data with speech context. It jointly models N-best ASR hypotheses and phoneme-level contextual information, and introduces a rule-guided, LLM-augmented synthetic data construction strategy to enhance robustness for low-frequency words while mitigating over-correction. The approach incorporates phoneme embedding representation, LLM fine-tuning, and N-best hypothesis rescoring. Contribution/Results: Evaluated on English and Japanese ASR benchmarks, the method achieves significant improvements in rare-word correction accuracy, alongside consistent reductions in both word error rate (WER) and character error rate (CER), demonstrating strong cross-lingual generalization capability.

Technology Category

Application Category

📝 Abstract
Generative error correction (GER) with large language models (LLMs) has emerged as an effective post-processing approach to improve automatic speech recognition (ASR) performance. However, it often struggles with rare or domain-specific words due to limited training data. Furthermore, existing LLM-based GER approaches primarily rely on textual information, neglecting phonetic cues, which leads to over-correction. To address these issues, we propose a novel LLM-based GER approach that targets rare words and incorporates phonetic information. First, we generate synthetic data to contain rare words for fine-tuning the GER model. Second, we integrate ASR's N-best hypotheses along with phonetic context to mitigate over-correction. Experimental results show that our method not only improves the correction of rare words but also reduces the WER and CER across both English and Japanese datasets.
Problem

Research questions and friction points this paper is trying to address.

Improves rare word correction in ASR using synthetic data
Incorporates phonetic context to reduce over-correction
Enhances accuracy across English and Japanese datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic data for rare words
Integrates phonetic context to reduce over-correction
Uses N-best hypotheses from ASR for accuracy
🔎 Similar Papers
No similar papers found.
N
Natsuo Yamashita
Hitachi, Ltd., Japan
M
Masaaki Yamamoto
Hitachi, Ltd., Japan
H
Hiroaki Kokubo
Hitachi, Ltd., Japan
Yohei Kawaguchi
Yohei Kawaguchi
Hitachi, Ltd.
Acoustic Signal ProcessingSignal ProcessingMachine LearningSpeech ProcessingAI