Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Large language models exhibit suboptimal performance in imperfect-information games, yet the underlying failure mechanisms remain poorly understood. This study identifies and empirically validates two critical internal gaps within these models: an “observation–belief gap” and a “belief–action gap,” which jointly expose their belief fragility and decision inconsistency. Leveraging open-source models including Llama 3.1, Qwen3, and gpt-oss, the research employs multi-hop reasoning, bias analysis, and Bayesian consistency tests to demonstrate that while implicit beliefs outperform explicit ones, they remain susceptible to perturbations. Crucially, neither implicit nor explicit beliefs consistently translate into improved strategic payoffs, highlighting a fundamental disconnect between belief representation and effective decision-making in competitive settings.

📝 Abstract

Large language models (LLMs) are increasingly tasked with strategic decision-making under incomplete information, such as in negotiation and policymaking. While LLMs can excel at many such tasks, they also fail in ways that are poorly understood. We shed light on these failures by uncovering two fundamental gaps in the internal mechanisms underlying the decision-making of LLMs in incomplete-information games, supported by experiments with open-weight models Llama 3.1, Qwen3, and gpt-oss. First, an observation-belief gap: LLMs encode internal beliefs about latent game states that are substantially more accurate than their own verbal reports, yet these beliefs are brittle. In particular, the belief accuracy degrades with multi-hop reasoning, exhibits primacy and recency biases, and drifts away from Bayesian coherence over extended interactions. Second, a belief-action gap: The implicit conversion of internal beliefs into actions is weaker than that of the beliefs externalized in the prompt, yet neither belief-conditioning consistently achieves higher game payoffs. These results show how analyzing LLMs' internal processes can expose systematic vulnerabilities that warrant caution before deploying LLMs in strategic domains without robust guardrails.

Problem

Research questions and friction points this paper is trying to address.

strategic decision-making

incomplete information

observation-belief gap

belief-action gap

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

observation-belief gap

belief-action gap

incomplete-information games