Benchmarking and Mitigating MCQA Selection Bias of Large Vision-Language Models

๐Ÿ“… 2025-09-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large Vision-Language Models (LVLMs) exhibit significant option-selection biases in multiple-choice question answering (MCQA)โ€”e.g., positional or token-level preferencesโ€”yet this issue remains systematically unexplored. To address this, we introduce the first fine-grained MCQA bias benchmark and propose a zero-shot, logit-level debiasing method applicable to frozen LVLMs without fine-tuning. Our approach estimates a bias vector via ensemble of generic and context-aware prompts, adaptively partitions questions by difficulty using semantic similarity, and applies confidence-weighted logit correction during inference. We uncover a strong correlation between task difficulty and bias magnitude. Extensive evaluation across state-of-the-art LVLMs demonstrates that our method substantially mitigates bias, improving average accuracy by up to 5.2% on high-difficulty samples and enhancing robustness in fine-grained vision-language reasoning.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Vision-Language Models (LVLMs) have achieved strong performance on vision-language tasks, particularly Visual Question Answering (VQA). While prior work has explored unimodal biases in VQA, the problem of selection bias in Multiple-Choice Question Answering (MCQA), where models may favor specific option tokens (e.g., "A") or positions, remains underexplored. In this paper, we investigate both the presence and nature of selection bias in LVLMs through fine-grained MCQA benchmarks spanning easy, medium, and hard difficulty levels, defined by the semantic similarity of the options. We further propose an inference-time logit-level debiasing method that estimates an ensemble bias vector from general and contextual prompts and applies confidence-adaptive corrections to the model's output. Our method mitigates bias without retraining and is compatible with frozen LVLMs. Extensive experiments across several state-of-the-art models reveal consistent selection biases that intensify with task difficulty, and show that our mitigation approach significantly reduces bias while improving accuracy in challenging settings. This work offers new insights into the limitations of LVLMs in MCQA and presents a practical approach to improve their robustness in fine-grained visual reasoning. Datasets and code are available at: https://github.com/Atabuzzaman/Selection-Bias-of-LVLMs
Problem

Research questions and friction points this paper is trying to address.

Investigating selection bias in LVLMs for multiple-choice question answering
Analyzing how models favor specific option tokens or positions in MCQA
Developing inference-time debiasing method to improve LVLM robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained MCQA benchmarks across difficulty levels
Inference-time logit-level debiasing with ensemble bias vectors
Confidence-adaptive corrections without model retraining
M
Md. Atabuzzaman
Department of Computer Science, Virginia Tech
A
Ali Asgarov
Department of Computer Science, Virginia Tech
Chris Thomas
Chris Thomas
Virginia Tech
Computer Vision