🤖 AI Summary
This work investigates whether large language models (LLMs) exhibit Bayesian-style stochastic decision-making—i.e., whether their outputs approximate posterior sampling (stochastic) or maximum a posteriori (MAP) estimation (deterministic). Contrary to the common assumption that sampling at non-zero temperature yields genuine randomness, the study reveals pervasive *conditional deterministic behavior*: LLM outputs remain highly predictable given context, even under stochastic decoding.
Method: The authors propose a Gibbs-sampling–inspired prior inference framework, integrating temperature scanning, cross-model consistency validation, and decision-pattern classification to systematically detect and quantify this latent determinism.
Contribution/Results: The analysis demonstrates that ignoring such determinism induces severe bias in prior estimation. The proposed method effectively mitigates “spurious prior” inference, substantially improving the reliability and interpretability of cognitive modeling with LLMs.
📝 Abstract
Language models are essentially probability distributions over token sequences. Auto-regressive models generate sentences by iteratively computing and sampling from the distribution of the next token. This iterative sampling introduces stochasticity, leading to the assumption that language models make probabilistic decisions, similar to sampling from unknown distributions. Building on this assumption, prior research has used simulated Gibbs sampling, inspired by experiments designed to elicit human priors, to infer the priors of language models. In this paper, we revisit a critical question: Do language models possess Bayesian brains? Our findings show that under certain conditions, language models can exhibit near-deterministic decision-making, such as producing maximum likelihood estimations, even with a non-zero sampling temperature. This challenges the sampling assumption and undermines previous methods for eliciting human-like priors. Furthermore, we demonstrate that without proper scrutiny, a system with deterministic behavior undergoing simulated Gibbs sampling can converge to a"false prior."To address this, we propose a straightforward approach to distinguish between stochastic and deterministic decision patterns in Gibbs sampling, helping to prevent the inference of misleading language model priors. We experiment on a variety of large language models to identify their decision patterns under various circumstances. Our results provide key insights in understanding decision making of large language models.