The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This study investigates implicit biases in SpeechLLMs when processing spoken inputs varying in accent and gender presentation. Using voice cloning to control linguistic content, the authors systematically manipulate speaker accent (e.g., Eastern European) and gender presentation across 2,880 interactions with three prominent SpeechLLMs. Adopting an intersectional lens, they introduce a multimodal evaluation framework combining pairwise LLM-based scoring, comparative ranking, best–worst scaling, and multidimensional human assessment. Their analysis reveals, for the first time, distinct bias patterns emerging at the intersection of accent and gender: voices with Eastern European accents—particularly those perceived as female—receive significantly lower helpfulness ratings. Notably, human evaluators demonstrate greater sensitivity than automated metrics, capturing nuanced intersectional disparities that algorithmic measures overlook.

Technology Category

Application Category

📝 Abstract

Speech Large Language Models (SpeechLLMs) process spoken input directly, retaining cues such as accent and perceived gender that were previously removed in cascaded pipelines. This introduces speaker identity dependent variation in responses. We present a large-scale intersectional evaluation of accent and gender bias in three SpeechLLMs using 2,880 controlled interactions across six English accents and two gender presentations, keeping linguistic content constant through voice cloning. Using pointwise LLM-judge ratings, pairwise comparisons, and Best-Worst Scaling with human validation, we detect consistent disparities. Eastern European-accented speech receives lower helpfulness scores, particularly for female-presenting voices. The bias is implicit: responses remain polite but differ in helpfulness. While LLM judges capture the directional trend of these biases, human evaluators exhibit significantly higher sensitivity, uncovering sharper intersectional disparities.

Problem

Research questions and friction points this paper is trying to address.

SpeechLLMs

intersectional bias

accent bias

gender bias

speaker identity

Innovation

Methods, ideas, or system contributions that make the work stand out.

SpeechLLMs

intersectional bias

voice cloning