🤖 AI Summary
Large language models (LLMs) exhibit an apparent paradox—overconfidence in initial judgments followed by rapid self-doubt upon challenge—yet the underlying cognitive mechanism remains unexplained.
Method: We introduce a novel experimental paradigm that decouples answer selection from confidence calibration, revealing a “choice-supportive bias”: LLMs reinforce selected answers, systematically undervalue counterevidence, and overreact to inconsistent feedback. Using controlled prompting and multi-turn feedback analysis, we evaluate this bias across Gemma-3, GPT-4o, and o1-preview.
Contribution/Results: We demonstrate its cross-model and cross-domain robustness; empirical decision updates significantly deviate from Bayesian norms and are not attributable to memory interference. This work provides the first systematic deconstruction of the “stubborn-yet-fickle” paradox in LLMs, offering both theoretical insight into their metacognitive limitations and a methodological framework for improving calibration mechanisms.
📝 Abstract
Large language models (LLMs) exhibit strikingly conflicting behaviors: they can appear steadfastly overconfident in their initial answers whilst at the same time being prone to excessive doubt when challenged. To investigate this apparent paradox, we developed a novel experimental paradigm, exploiting the unique ability to obtain confidence estimates from LLMs without creating memory of their initial judgments -- something impossible in human participants. We show that LLMs -- Gemma 3, GPT4o and o1-preview -- exhibit a pronounced choice-supportive bias that reinforces and boosts their estimate of confidence in their answer, resulting in a marked resistance to change their mind. We further demonstrate that LLMs markedly overweight inconsistent compared to consistent advice, in a fashion that deviates qualitatively from normative Bayesian updating. Finally, we demonstrate that these two mechanisms -- a drive to maintain consistency with prior commitments and hypersensitivity to contradictory feedback -- parsimoniously capture LLM behavior in a different domain. Together, these findings furnish a mechanistic account of LLM confidence that explains both their stubbornness and excessive sensitivity to criticism.