MacST: Multi-Accent Speech Synthesis via Text Transliteration for Accent Conversion

📅 2024-09-14
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-quality bilingual-accent parallel speech data is scarce for accent conversion. Method: We propose an LLM-driven end-to-end accent conversion framework: (1) leveraging large language models to generate accented text transcriptions (e.g., American → British), (2) synthesizing speaker-consistent, multi-accent speech pairs using multilingual TTS, and (3) training a sequence-to-sequence accent conversion model on the synthetic data. Contribution/Results: This work pioneers an LLM-based paradigm for constructing high-fidelity synthetic parallel accent corpora—eliminating reliance on scarce real bilingual-accent recordings—and supports both native and non-native English speakers. Experiments demonstrate that our synthetic dataset significantly improves model performance, achieving +0.42 in subjective MOS and +12.3% in objective ABX accent discrimination accuracy over baselines, validating the efficacy of synthetic data for accent modeling.

Technology Category

Application Category

📝 Abstract
In accented voice conversion or accent conversion, we seek to convert the accent in speech from one another while preserving speaker identity and semantic content. In this study, we formulate a novel method for creating multi-accented speech samples, thus pairs of accented speech samples by the same speaker, through text transliteration for training accent conversion systems. We begin by generating transliterated text with Large Language Models (LLMs), which is then fed into multilingual TTS models to synthesize accented English speech. As a reference system, we built a sequence-to-sequence model on the synthetic parallel corpus for accent conversion. We validated the proposed method for both native and non-native English speakers. Subjective and objective evaluations further validate our dataset's effectiveness in accent conversion studies.
Problem

Research questions and friction points this paper is trying to address.

Accent Conversion
Speaker Style Preservation
Semantic Integrity
Innovation

Methods, ideas, or system contributions that make the work stand out.

MacST
Accent Transformation
Multilingual Speech Synthesis
🔎 Similar Papers
No similar papers found.