"I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities

📅 2024-12-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Speech-based named entity recognition (NER) suffers from poor generalization to out-of-vocabulary (OOV) person and location names, while manual annotation of speech data is prohibitively expensive. Method: We propose a Named Entity Dictionary (NED)-driven synthetic data generation framework: leveraging large language models (LLMs) to generate text sentences containing novel entities, converting them to speech via text-to-speech (TTS), and applying noise-aware quality filtering. Contribution/Results: We introduce the first zero-shot evaluation benchmark for speech NER targeting unseen entities; propose an LLM+TTS co-generation paradigm with noise-perceptive filtering for low-cost, high-fidelity synthetic speech data; and achieve state-of-the-art performance across in-domain, zero-shot domain adaptation, and fully zero-shot settings. We publicly release (i) an NED containing 8,853 entities, (ii) the new benchmark dataset, and (iii) all source code.

Technology Category

Application Category

📝 Abstract
Spoken named entity recognition (NER) aims to identify named entities from speech, playing an important role in speech processing. New named entities appear every day, however, annotating their Spoken NER data is costly. In this paper, we demonstrate that existing Spoken NER systems perform poorly when dealing with previously unseen named entities. To tackle this challenge, we propose a method for generating Spoken NER data based on a named entity dictionary (NED) to reduce costs. Specifically, we first use a large language model (LLM) to generate sentences from the sampled named entities and then use a text-to-speech (TTS) system to generate the speech. Furthermore, we introduce a noise metric to filter out noisy data. To evaluate our approach, we release a novel Spoken NER benchmark along with a corresponding NED containing 8,853 entities. Experiment results show that our method achieves state-of-the-art (SOTA) performance in the in-domain, zero-shot domain adaptation, and fully zero-shot settings. Our data will be available at https://github.com/DeepLearnXMU/HeardU.
Problem

Research questions and friction points this paper is trying to address.

Novel Name Recognition
Oral NER System
Data Annotation Cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Economic NER Data Generation
TTS Integration for NER
Quality Control in Data Generation
🔎 Similar Papers
No similar papers found.
Jiawei Yu
Jiawei Yu
Xiamen University
SpeechNatural Language Processing
X
Xiang Geng
National Key Laboratory for Novel Software Technology, Nanjing University, China
Yuang Li
Yuang Li
2012 Lab, Huawei
SpeechNLP
M
Mengxin Ren
Huawei Translation Services Center, China
W
Wei Tang
Huawei Translation Services Center, China
Jiahuan Li
Jiahuan Li
Meituan Inc.
Natural Language Processing
Zhibin Lan
Zhibin Lan
Xiamen University
Natural Language Processing
M
Min Zhang
Huawei Translation Services Center, China
H
Hao Yang
Huawei Translation Services Center, China
Shujian Huang
Shujian Huang
School of Computer Science, Nanjing University
Natural Language ProcessingMachine TranslationMultilingualismLarge Language Models
Jinsong Su
Jinsong Su
Xiamen University
Natural Language ProcessingDeep LearningNeural Machine Translation