Few-Shot Accent Synthesis for ASR with LLM-Guided Phoneme Editing

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

189K/year
🤖 AI Summary
This study addresses the performance bottleneck of accented speech recognition under extremely low-resource conditions (fewer than 10 utterances). The authors propose a novel approach that leverages large language model (LLM)-guided phoneme-level editing, combined with a small number of target-accented samples, to generate structured accented synthetic speech for fine-tuning self-supervised automatic speech recognition (ASR) models. This work is the first to introduce LLM-driven phoneme editing into ultra-low-resource accent adaptation, revealing that perturbations in phoneme space alone constitute an effective form of data augmentation. Experimental results demonstrate significant reductions in word error rate (WER) on real accented speech and consistent improvements across speakers and under extreme data scarcity.
📝 Abstract
Accented automatic speech recognition (ASR) often degrades due to the limited availability of accented training data. Prior work has explored accent modeling in low-resource settings, but existing approaches typically require minutes to hours of labeled speech, which may still be impractical for truly scarce accent scenarios. We propose a pipeline that adapts a text-to-speech (TTS) decoder to a target-accent speaker using fewer than ten reference utterances and employs large language model (LLM)-based phoneme editing to generate accent-conditioned pronunciations. The resulting synthetic speech is used to fine-tune a self-supervised ASR model. Experiments demonstrate consistent word error rate (WER) reductions on real accented speech, including cross-speaker evaluation and ultra-low data regimes. A matched-rate random phoneme baseline shows that phoneme-space perturbation itself is a strong form of augmentation, while LLM-guided edits provide additional gains through accent-conditioned structure.
Problem

Research questions and friction points this paper is trying to address.

Few-Shot Accent Synthesis
Accented ASR
Low-Resource Speech Recognition
Phoneme Editing
Accent Adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-Shot Accent Syn日消息ynthesis
LLM-Guided Phoneme Editing
Low-Resource ASR
Self-Supervised Speech Recognition
Phoneme-Space Augmentation