LLMs are not (consistently) Bayesian: Quantifying internal (in)consistencies of LLMs' probabilistic beliefs

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

233K/year
🤖 AI Summary
This study addresses the question of whether large language models (LLMs) update their probabilistic beliefs in a manner consistent with Bayesian principles when faced with new evidence—a critical capability for reasoning under uncertainty. The authors introduce, for the first time, a metric termed the “information processing gap” to systematically evaluate belief-updating consistency in mainstream LLMs. Through comprehensive analyses involving multi-source evidence integration, comparison against normative Bayesian inference, heuristic behavior characterization, and consistency diagnostics, the work reveals that LLMs predominantly rely on non-Bayesian heuristic strategies. Notably, these heuristics often outperform strict Bayesian updating in downstream tasks, suggesting a fundamental misspecification in the models’ internal world representations. The study further provides an interpretable diagnostic framework to characterize such deviations from rational belief updating.
📝 Abstract
Modern AI systems are being deployed in complex domains such as medicine, science, and law, where it is important that they not only produce correct answers, but also represent and update uncertain beliefs about the world as new evidence arrives. We introduce the novel technique of studying LLMs as information processing rules and utilize the information processing gap to study the internal (in)consistencies of how LLMs update their probabilistic beliefs from evidence. Our extensive experiments evaluate multiple approaches in which LLMs can incorporate evidence into their beliefs. Some of these approaches produce (nearly) Bayesian updates; others seem to use a learned heuristic. Surprisingly, the non-Bayesian heuristic updates often outperform exact Bayesian computation in terms of downstream task performance -- indicating the LLMs' probabilistic models of the world are misspecified. Lastly, we show how our measure can provide diagnostics to identify issues with LLM-powered inferential systems.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Bayesian inference
probabilistic beliefs
information processing
belief updating
Innovation

Methods, ideas, or system contributions that make the work stand out.

information processing gap
Bayesian updating
probabilistic belief
LLM consistency
belief misspecification