Gender Bias in Instruction-Guided Speech Synthesis Models

📅 2025-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates implicit gender bias in instruction-guided text-to-speech (TTS) models when processing ambiguous occupational style prompts (e.g., “speak like a nurse”). We design a controllable expressive TTS experimental framework integrating prompt-based style interventions, acoustic feature analysis, and cross-model bias comparison. Our work is the first to systematically reveal that mainstream TTS models exhibit stereotyped responses to occupation–gender associations: prompts such as “nurse” or “teacher” consistently trigger significantly higher probabilities of female voice generation, with bias intensity increasing with model scale. Beyond empirically confirming the presence of latent sociocultural bias in instruction-driven speech synthesis, we establish a reproducible, quantitative bias evaluation paradigm grounded in acoustic and behavioral metrics. This provides both theoretical grounding and empirical evidence for developing fair, controllable, and socially aware TTS systems.

Technology Category

Application Category

📝 Abstract
Recent advancements in controllable expressive speech synthesis, especially in text-to-speech (TTS) models, have allowed for the generation of speech with specific styles guided by textual descriptions, known as style prompts. While this development enhances the flexibility and naturalness of synthesized speech, there remains a significant gap in understanding how these models handle vague or abstract style prompts. This study investigates the potential gender bias in how models interpret occupation-related prompts, specifically examining their responses to instructions like"Act like a nurse". We explore whether these models exhibit tendencies to amplify gender stereotypes when interpreting such prompts. Our experimental results reveal the model's tendency to exhibit gender bias for certain occupations. Moreover, models of different sizes show varying degrees of this bias across these occupations.
Problem

Research questions and friction points this paper is trying to address.

Gender bias in speech synthesis
Handling vague style prompts
Amplifying gender stereotypes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes gender bias in TTS models
Examines occupation-related style prompts
Compares bias across different model sizes
🔎 Similar Papers
No similar papers found.