Voicing Personas: Rewriting Persona Descriptions into Style Prompts for Controllable Text-to-Speech

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses fine-grained prosodic control—specifically pitch, emotion, and speaking rate—in controllable text-to-speech (TTS) via textual persona descriptions. We propose the first persona-to-voice prompt rewriting framework, leveraging two LLM-based rewriting strategies to transform unstructured persona texts into structured, speech-style prompts. To enhance control fidelity, we integrate prosody-disentangled acoustic modeling with tailored prompt engineering. Notably, we systematically uncover and quantify implicit societal biases—particularly gender bias—introduced by LLMs during persona rewriting, a previously unexplored issue. Extensive experiments demonstrate significant improvements in synthesized speech naturalness, intelligibility, and style consistency. Our approach achieves state-of-the-art performance across both objective metrics and multi-dimensional subjective evaluations, including MOS, SIM, and AB tests.

Technology Category

Application Category

📝 Abstract

In this paper, we propose a novel framework to control voice style in prompt-based, controllable text-to-speech systems by leveraging textual personas as voice style prompts. We present two persona rewriting strategies to transform generic persona descriptions into speech-oriented prompts, enabling fine-grained manipulation of prosodic attributes such as pitch, emotion, and speaking rate. Experimental results demonstrate that our methods enhance the naturalness, clarity, and consistency of synthesized speech. Finally, we analyze implicit social biases introduced by LLM-based rewriting, with a focus on gender. We underscore voice style as a crucial factor for persona-driven AI dialogue systems.

Problem

Research questions and friction points this paper is trying to address.

Control voice style in text-to-speech using persona prompts

Transform persona descriptions into speech-oriented style prompts

Analyze social biases in LLM-based persona rewriting for voice

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging textual personas as voice style prompts

Rewriting persona descriptions into speech-oriented prompts

Enhancing prosodic attributes like pitch and emotion

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Engineer, Voice

Inflection AI

$225,000 to $325,000, depending on a candidate’s qualifications and level of experience. This role also includes a meaningful equity component, allowing employees to share in the long-term success of the company.

Bay Area

AI Research Scientist - Voice AI Team, Meta Superintelligence Labs