Generative Annotation for ASR Named Entity Correction

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
End-to-end automatic speech recognition (ASR) systems frequently misrecognize domain-specific named entities (NEs), compromising downstream task performance. Existing phoneme-based edit-distance approaches for named entity correction (NEC) struggle to localize errors involving large morphological discrepancies. To address this, we propose an end-to-end NEC framework that jointly leverages acoustic features and generative sequence labeling: first retrieving candidate entities via acoustic similarity, then performing joint error detection and correction using a generative model. Our approach circumvents the limitations of rigid alignment imposed by conventional edit-distance metrics, thereby significantly improving correction accuracy—particularly for NEs with substantial orthographic or morphological divergence. Extensive evaluation on both public and proprietary test sets demonstrates substantial gains in entity-level accuracy. To foster reproducibility and further research, we release our code, datasets, and pre-trained models.

Technology Category

Application Category

📝 Abstract
End-to-end automatic speech recognition systems often fail to transcribe domain-specific named entities, causing catastrophic failures in downstream tasks. Numerous fast and lightweight named entity correction (NEC) models have been proposed in recent years. These models, mainly leveraging phonetic-level edit distance algorithms, have shown impressive performances. However, when the forms of the wrongly-transcribed words(s) and the ground-truth entity are significantly different, these methods often fail to locate the wrongly transcribed words in hypothesis, thus limiting their usage. We propose a novel NEC method that utilizes speech sound features to retrieve candidate entities. With speech sound features and candidate entities, we inovatively design a generative method to annotate entity errors in ASR transcripts and replace the text with correct entities. This method is effective in scenarios of word form difference. We test our method using open-source and self-constructed test sets. The results demonstrate that our NEC method can bring significant improvement to entity accuracy. We will open source our self-constructed test set and training data.
Problem

Research questions and friction points this paper is trying to address.

Correcting domain-specific named entities in ASR outputs
Addressing phonetic differences between transcribed and correct entities
Improving entity accuracy when word forms differ significantly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses speech sound features for candidate retrieval
Generative method annotates entity errors
Replaces incorrect text with correct entities
🔎 Similar Papers
No similar papers found.
Yuanchang Luo
Yuanchang Luo
2012@Huawei
D
Daimeng Wei
Huawei Translation Service Center, Beijing, China
Shaojun Li
Shaojun Li
Engineer, 2012 Lab, Huawei Co. LTD
H
Hengchao Shang
Huawei Translation Service Center, Beijing, China
J
Jiaxin Guo
Huawei Translation Service Center, Beijing, China
Zongyao Li
Zongyao Li
Huawei Translation Service Center, Beijing, China
Zhanglin Wu
Zhanglin Wu
2012 Lab, Huawei Co. LTD
Machine TranslationNatural Language Processing
X
Xiaoyu Chen
Huawei Translation Service Center, Beijing, China
Zhiqiang Rao
Zhiqiang Rao
Huawei
NLP
J
Jinlong Yang
Huawei Translation Service Center, Beijing, China
H
Hao Yang
Huawei Translation Service Center, Beijing, China