🤖 AI Summary
This study addresses how voice interaction, while enhancing the accessibility of generative AI, may exacerbate gender bias through paralinguistic cues such as pitch. It demonstrates for the first time that voice-enabled large language models systematically generate gender-stereotypical adjectives and occupations based on speakers’ vocal characteristics, exhibiting significantly stronger bias than in text-only interactions. Combining audio-driven large models, pitch manipulation techniques, and a large-scale user survey (n=1,000), the research empirically confirms that voice alone can trigger implicit attribute inferences. The work further proposes pitch modulation as an effective mitigation strategy. Notably, low-frequency users are found to be most sensitive to such biases, often leading them to disengage—a finding that underscores the risks posed by this novel bias mechanism in voice interfaces and the urgent need for intervention.
📝 Abstract
Hundreds of millions of people rely on large language models (LLMs) for education, work, and even healthcare. Yet these models are known to reproduce and amplify social biases present in their training data. Moreover, text-based interfaces remain a barrier for many, for example, users with limited literacy, motor impairments, or mobile-only devices. Voice interaction promises to expand accessibility, but unlike text, speech carries identity cues that users cannot easily mask, raising concerns about whether accessibility gains may come at the cost of equitable treatment. Here we show that audio-enabled LLMs exhibit systematic gender discrimination, shifting responses toward gender-stereotyped adjectives and occupations solely on the basis of speaker voice, and amplifying bias beyond that observed in text-based interaction. Thus, voice interfaces do not merely extend text models to a new modality but introduce distinct bias mechanisms tied to paralinguistic cues. Complementary survey evidence ($n=1,000$) shows that infrequent chatbot users are most hesitant to undisclosed attribute inference and most likely to disengage when such practices are revealed. To demonstrate a potential mitigation strategy, we show that pitch manipulation can systematically regulate gender-discriminatory outputs. Overall, our findings reveal a critical tension in AI development: efforts to expand accessibility through voice interfaces simultaneously create new pathways for discrimination, demanding that fairness and accessibility be addressed in tandem.