VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Current evaluations of speech fairness predominantly rely on synthetic audio and multiple-choice tasks, which struggle to uncover generative biases of large audio language models in real-world interactions. This work proposes the first evaluation framework that integrates authentic human speech with open-ended generation tasks—such as personalized recommendations—to naturally surface stereotypical associations without predefined response options. The framework enables multidimensional analysis of social attributes, including gender and accent, by examining their distributional patterns in model outputs. Evaluation across eleven state-of-the-art models reveals that gender cues induce more pronounced output distribution shifts than accent, demonstrating that these models reproduce real-world societal biases. These findings underscore the framework’s effectiveness and scalability for assessing fairness in spoken language systems.

Technology Category

Application Category

📝 Abstract

Large Audio-Language Models (LALMs) are increasingly integrated into daily applications, yet their generative biases remain underexplored. Existing speech fairness benchmarks rely on synthetic speech and Multiple-Choice Questions (MCQs), both offering a fragmented view of fairness. We propose VIBE, a framework that evaluates generative bias through open-ended tasks such as personalized recommendations, using real-world human recordings. Unlike MCQs, our method allows stereotypical associations to manifest organically without predefined options, making it easily extensible to new tasks. Evaluating 11 state-of-the-art LALMs reveals systematic biases in realistic scenarios. We find that gender cues often trigger larger distributional shifts than accent cues, indicating that current LALMs reproduce social stereotypes.

Problem

Research questions and friction points this paper is trying to address.

generative bias

Large Audio-Language Models

fairness evaluation

real-world speech

social stereotypes

Innovation

Methods, ideas, or system contributions that make the work stand out.

generative bias

open-ended evaluation

real-world speech