Voicing Personas: Rewriting Persona Descriptions into Style Prompts for Controllable Text-to-Speech

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses fine-grained prosodic control—specifically pitch, emotion, and speaking rate—in controllable text-to-speech (TTS) via textual persona descriptions. We propose the first persona-to-voice prompt rewriting framework, leveraging two LLM-based rewriting strategies to transform unstructured persona texts into structured, speech-style prompts. To enhance control fidelity, we integrate prosody-disentangled acoustic modeling with tailored prompt engineering. Notably, we systematically uncover and quantify implicit societal biases—particularly gender bias—introduced by LLMs during persona rewriting, a previously unexplored issue. Extensive experiments demonstrate significant improvements in synthesized speech naturalness, intelligibility, and style consistency. Our approach achieves state-of-the-art performance across both objective metrics and multi-dimensional subjective evaluations, including MOS, SIM, and AB tests.

Technology Category

Application Category

📝 Abstract
In this paper, we propose a novel framework to control voice style in prompt-based, controllable text-to-speech systems by leveraging textual personas as voice style prompts. We present two persona rewriting strategies to transform generic persona descriptions into speech-oriented prompts, enabling fine-grained manipulation of prosodic attributes such as pitch, emotion, and speaking rate. Experimental results demonstrate that our methods enhance the naturalness, clarity, and consistency of synthesized speech. Finally, we analyze implicit social biases introduced by LLM-based rewriting, with a focus on gender. We underscore voice style as a crucial factor for persona-driven AI dialogue systems.
Problem

Research questions and friction points this paper is trying to address.

Control voice style in text-to-speech using persona prompts
Transform persona descriptions into speech-oriented style prompts
Analyze social biases in LLM-based persona rewriting for voice
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging textual personas as voice style prompts
Rewriting persona descriptions into speech-oriented prompts
Enhancing prosodic attributes like pitch and emotion
🔎 Similar Papers
No similar papers found.