Few-Shot Accent Synthesis for ASR with LLM-Guided Phoneme Editing

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

141K/year

🤖 AI Summary

This study addresses the performance bottleneck of accented speech recognition under extremely low-resource conditions (fewer than 10 utterances). The authors propose a novel approach that leverages large language model (LLM)-guided phoneme-level editing, combined with a small number of target-accented samples, to generate structured accented synthetic speech for fine-tuning self-supervised automatic speech recognition (ASR) models. This work is the first to introduce LLM-driven phoneme editing into ultra-low-resource accent adaptation, revealing that perturbations in phoneme space alone constitute an effective form of data augmentation. Experimental results demonstrate significant reductions in word error rate (WER) on real accented speech and consistent improvements across speakers and under extreme data scarcity.

📝 Abstract

Accented automatic speech recognition (ASR) often degrades due to the limited availability of accented training data. Prior work has explored accent modeling in low-resource settings, but existing approaches typically require minutes to hours of labeled speech, which may still be impractical for truly scarce accent scenarios. We propose a pipeline that adapts a text-to-speech (TTS) decoder to a target-accent speaker using fewer than ten reference utterances and employs large language model (LLM)-based phoneme editing to generate accent-conditioned pronunciations. The resulting synthetic speech is used to fine-tune a self-supervised ASR model. Experiments demonstrate consistent word error rate (WER) reductions on real accented speech, including cross-speaker evaluation and ultra-low data regimes. A matched-rate random phoneme baseline shows that phoneme-space perturbation itself is a strong form of augmentation, while LLM-guided edits provide additional gains through accent-conditioned structure.

Problem

Research questions and friction points this paper is trying to address.

Few-Shot Accent Synthesis

Accented ASR

Low-Resource Speech Recognition

Phoneme Editing

Accent Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-Shot Accent Syn日消息ynthesis

LLM-Guided Phoneme Editing

Low-Resource ASR

Self-Supervised Speech Recognition

Phoneme-Space Augmentation

🔎 Similar Papers

MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion

2024-09-14arXiv.orgCitations: 0

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation

2024-09-13arXiv.orgCitations: 1