Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work identifies significant gender and phonation-type (e.g., breathy, creaky) interaction biases in speech foundation models during speech continuation—a novel task requiring coherent single-audio-stream extension. The task explicitly probes speaker similarity preservation, voice quality fidelity, and text-level bias. Systematic evaluation reveals that female prompts disproportionately trigger regression toward modal phonation and exacerbate text-level gender bias. All evaluated models—SpiritLM-base/expressive, VAE-GSLM, and SpeechGPT—exhibit voice-quality bias against female voices; notably, VAE-GSLM manifests the strongest text-level bias after achieving continuity thresholds. This study establishes the first dedicated benchmark for fairness assessment in speech continuation, introducing a new paradigm grounded in empirical evidence to evaluate demographic and phonatory fairness in speech foundation models.

Technology Category

Application Category

📝 Abstract

Speech Continuation (SC) is the task of generating a coherent extension of a spoken prompt while preserving both semantic context and speaker identity. Because SC is constrained to a single audio stream, it offers a more direct setting for probing biases in speech foundation models than dialogue does. In this work we present the first systematic evaluation of bias in SC, investigating how gender and phonation type (breathy, creaky, end-creak) affect continuation behaviour. We evaluate three recent models: SpiritLM (base and expressive), VAE-GSLM, and SpeechGPT across speaker similarity, voice quality preservation, and text-based bias metrics. Results show that while both speaker similarity and coherence remain a challenge, textual evaluations reveal significant model and gender interactions: once coherence is sufficiently high (for VAE-GSLM), gender effects emerge on text-metrics such as agency and sentence polarity. In addition, continuations revert toward modal phonation more strongly for female prompts than for male ones, revealing a systematic voice-quality bias. These findings highlight SC as a controlled probe of socially relevant representational biases in speech foundation models, and suggest that it will become an increasingly informative diagnostic as continuation quality improves.

Problem

Research questions and friction points this paper is trying to address.

Probing gender and voice quality biases in speech foundation models

Evaluating speaker identity preservation and coherence in speech continuation

Assessing systematic biases in phonation type and gender interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Speech Continuation task preserves speaker identity

Evaluates bias through gender and phonation type

Uses speaker similarity and text-based metrics

🔎 Similar Papers

No similar papers found.