Hearing the Order: Investigating Selection Bias in Large Audio-Language Models

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically reveals, for the first time, a significant ordering bias in large audio-language models (LALMs) on multiple-choice tasks: answer-option ordering induces up to 24% performance variation and alters model rankings, severely compromising evaluation reliability. We conduct multi-model, multi-benchmark experiments across six state-of-the-art LALMs and three established benchmarks—including their speech-augmented variants—employing diverse answer-order permutation strategies for ablation analysis. Results demonstrate that all evaluated models exhibit this bias; notably, a simple, lightweight permutation-based mitigation strategy substantially reduces the bias across most settings. Beyond exposing a critical flaw in current audio-language evaluation paradigms—namely, uncontrolled answer-order effects—this work provides an immediately deployable, plug-and-play bias correction method. It thus establishes foundational principles for fair, robust evaluation and deployment of LALMs.

Technology Category

Application Category

📝 Abstract
Large audio-language models (LALMs) are often used in tasks that involve reasoning over ordered options. An open question is whether their predictions are influenced by the order of answer choices, which would indicate a form of selection bias and undermine their reliability. In this paper, we identify and analyze this problem in LALMs. We demonstrate that no model is immune to this bias through extensive experiments on six LALMs across three widely used benchmarks and their spoken counterparts. Shuffling the order of answer options can cause performance fluctuations of up to 24% and even change model rankings, raising concerns about the reliability of current evaluation practices. We also study permutation-based strategies and show that they can mitigate bias in most cases. Our work represents the first systematic investigation of this issue in LALMs, and we hope it raises awareness and motivates further research in this direction.
Problem

Research questions and friction points this paper is trying to address.

Investigating selection bias in audio-language models' predictions
Analyzing how answer order affects model performance and rankings
Developing permutation strategies to mitigate this selection bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Permutation-based strategies mitigate selection bias
Extensive experiments reveal performance fluctuations up to 24%
Systematic investigation addresses order influence in LALMs
🔎 Similar Papers
No similar papers found.