FAC-FACodec: Controllable Zero-Shot Foreign Accent Conversion with Factorized Speech Codec

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing accent conversion (AC) methods—particularly foreign accent conversion (FAC)—lack explicit control over conversion strength, hindering simultaneous achievement of accurate accent modification and speaker identity preservation. To address this, we propose the first controllable zero-shot FAC framework. Our approach leverages a factorized speech codec to disentangle speech representations into three orthogonal components: linguistic content, prosody (pitch contour and phoneme duration), and speaker identity. By introducing explicit, user-controllable accent modification parameters, our method enables targeted adjustment of phonetic features while strictly preserving suprasegmental prosodic cues and speaker-specific characteristics. Experiments demonstrate that our framework achieves conversion quality on par with state-of-the-art systems, while maintaining superior speaker consistency. Crucially, it supports fine-grained, interpretable, and user-adjustable control over accent strength—enabling personalized, intensity-tuned FAC without requiring parallel or speaker-specific training data.

Technology Category

Application Category

📝 Abstract
Previous accent conversion (AC) methods, including foreign accent conversion (FAC), lack explicit control over the degree of modification. Because accent modification can alter the perceived speaker identity, balancing conversion strength and identity preservation is crucial. We present an AC framework that provides an explicit, user-controllable parameter for accent modification. The method targets pronunciation while preserving suprasegmental cues such as intonation and phoneme durations. Results show performance comparable to recent AC systems, stronger preservation of speaker identity, and unique support for controllable accent conversion.
Problem

Research questions and friction points this paper is trying to address.

Enabling controllable accent strength modification
Preserving speaker identity during accent conversion
Targeting pronunciation while maintaining suprasegmental speech cues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicit user-controllable parameter for accent modification
Targets pronunciation while preserving suprasegmental cues
Factorized speech codec enabling controllable zero-shot conversion
🔎 Similar Papers
No similar papers found.
Y
Yurii Halychanskyi
The Grainger College of Engineering, Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
C
Cameron Churchwell
The Grainger College of Engineering, Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
Y
Yutong Wen
The Grainger College of Engineering, Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, IL, USA
Volodymyr Kindratenko
Volodymyr Kindratenko
University of Illinois at Urbana-Champaign
HPCAI