FAC-FACodec: Controllable Zero-Shot Foreign Accent Conversion with Factorized Speech Codec

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Existing accent conversion (AC) methods—particularly foreign accent conversion (FAC)—lack explicit control over conversion strength, hindering simultaneous achievement of accurate accent modification and speaker identity preservation. To address this, we propose the first controllable zero-shot FAC framework. Our approach leverages a factorized speech codec to disentangle speech representations into three orthogonal components: linguistic content, prosody (pitch contour and phoneme duration), and speaker identity. By introducing explicit, user-controllable accent modification parameters, our method enables targeted adjustment of phonetic features while strictly preserving suprasegmental prosodic cues and speaker-specific characteristics. Experiments demonstrate that our framework achieves conversion quality on par with state-of-the-art systems, while maintaining superior speaker consistency. Crucially, it supports fine-grained, interpretable, and user-adjustable control over accent strength—enabling personalized, intensity-tuned FAC without requiring parallel or speaker-specific training data.

Technology Category

Application Category

📝 Abstract

Previous accent conversion (AC) methods, including foreign accent conversion (FAC), lack explicit control over the degree of modification. Because accent modification can alter the perceived speaker identity, balancing conversion strength and identity preservation is crucial. We present an AC framework that provides an explicit, user-controllable parameter for accent modification. The method targets pronunciation while preserving suprasegmental cues such as intonation and phoneme durations. Results show performance comparable to recent AC systems, stronger preservation of speaker identity, and unique support for controllable accent conversion.

Problem

Research questions and friction points this paper is trying to address.

Enabling controllable accent strength modification

Preserving speaker identity during accent conversion

Targeting pronunciation while maintaining suprasegmental speech cues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit user-controllable parameter for accent modification

Targets pronunciation while preserving suprasegmental cues

Factorized speech codec enabling controllable zero-shot conversion

🔎 Similar Papers

AccentBox: Towards High-Fidelity Zero-Shot Accent Generation