The Role of Model Confidence on Bias Effects in Measured Uncertainties

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This study addresses the accurate quantification of epistemic uncertainty (knowledge gaps) and aleatoric uncertainty (answer diversity) in open-ended visual question answering (VQA) by large language models (LLMs), revealing systematic biases arising from interactions between prompt bias and model confidence. Method: We conduct controlled bias experiments using GPT-4o and Qwen2-VL, integrating uncertainty decomposition, confidence-stratified analysis, and prompt interventions. Contribution/Results: We first demonstrate that prompt bias markedly exacerbates underestimation of epistemic uncertainty—inducing overconfidence—at low confidence levels, while leaving the directional bias in aleatoric uncertainty unchanged. Critically, sensitivity to prompt bias for both uncertainty types escalates sharply as unbiased confidence decreases. Furthermore, we show that mitigating prompt bias significantly improves uncertainty calibration in GPT-4o, establishing the fundamental role of confidence–bias coupling in uncertainty estimation. These findings provide a principled framework for diagnosing and improving uncertainty-aware VQA.

Technology Category

Application Category

📝 Abstract

With the growing adoption of Large Language Models (LLMs) for open-ended tasks, accurately assessing epistemic uncertainty, which reflects a model's lack of knowledge, has become crucial to ensuring reliable outcomes. However, quantifying epistemic uncertainty in such tasks is challenging due to the presence of aleatoric uncertainty, which arises from multiple valid answers. While bias can introduce noise into epistemic uncertainty estimation, it may also reduce noise from aleatoric uncertainty. To investigate this trade-off, we conduct experiments on Visual Question Answering (VQA) tasks and find that mitigating prompt-introduced bias improves uncertainty quantification in GPT-4o. Building on prior work showing that LLMs tend to copy input information when model confidence is low, we further analyze how these prompt biases affect measured epistemic and aleatoric uncertainty across varying bias-free confidence levels with GPT-4o and Qwen2-VL. We find that all considered biases induce greater changes in both uncertainties when bias-free model confidence is lower. Moreover, lower bias-free model confidence leads to greater underestimation of epistemic uncertainty (i.e. overconfidence) due to bias, whereas it has no significant effect on the direction of changes in aleatoric uncertainty estimation. These distinct effects deepen our understanding of bias mitigation for uncertainty quantification and potentially inform the development of more advanced techniques.

Problem

Research questions and friction points this paper is trying to address.

Assessing epistemic uncertainty in LLMs for reliable outcomes

Quantifying epistemic uncertainty amid aleatoric uncertainty challenges

Investigating bias effects on uncertainty estimation in GPT-4o

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mitigating prompt-introduced bias improves uncertainty quantification

Analyzing bias effects on epistemic and aleatoric uncertainty

Lower bias-free confidence increases epistemic uncertainty underestimation

🔎 Similar Papers

Unmasking Social Bots: How Confident Are We?