🤖 AI Summary
Do large language models (LLMs) satisfy rational norms—particularly logical consistency and coherence of belief strength? This work provides the first systematic demonstration that certain LLMs can be formally modeled as rational agents. We introduce Minimal Agreement Connection (MAC), a novel paradigm that directly maps internal next-token probabilities to interpretable, quantitative measures of belief strength, enabling unified empirical testing of both logical and probabilistic coherence. Methodologically, we integrate formal modeling, probabilistic semantics, and philosophical theories of rationality, conducting validation grounded exclusively in each model’s native probability distribution. Results show that several mainstream LLMs satisfy belief coherence norms under specific conditions, yet exhibit well-defined rational boundaries. This study establishes the first rigorously probability-based foundation for rational agent modeling of LLMs, with direct implications for AI safety, behavioral interpretability, and alignment research.
📝 Abstract
Do norms of rationality apply to machine learning models, in particular language models? In this paper we investigate this question by focusing on a special subset of rational norms: coherence norms. We consider both logical coherence norms as well as coherence norms tied to the strength of belief. To make sense of the latter, we introduce the Minimal Assent Connection (MAC) and propose a new account of credence, which captures the strength of belief in language models. This proposal uniformly assigns strength of belief simply on the basis of model internal next token probabilities. We argue that rational norms tied to coherence do apply to some language models, but not to others. This issue is significant since rationality is closely tied to predicting and explaining behavior, and thus it is connected to considerations about AI safety and alignment, as well as understanding model behavior more generally.