Are language models rational? The case of coherence norms and belief revision

📅 2024-06-05
🏛️ arXiv.org
📈 Citations: 13
Influential: 0
📄 PDF
🤖 AI Summary
Do large language models (LLMs) satisfy rational norms—particularly logical consistency and coherence of belief strength? This work provides the first systematic demonstration that certain LLMs can be formally modeled as rational agents. We introduce Minimal Agreement Connection (MAC), a novel paradigm that directly maps internal next-token probabilities to interpretable, quantitative measures of belief strength, enabling unified empirical testing of both logical and probabilistic coherence. Methodologically, we integrate formal modeling, probabilistic semantics, and philosophical theories of rationality, conducting validation grounded exclusively in each model’s native probability distribution. Results show that several mainstream LLMs satisfy belief coherence norms under specific conditions, yet exhibit well-defined rational boundaries. This study establishes the first rigorously probability-based foundation for rational agent modeling of LLMs, with direct implications for AI safety, behavioral interpretability, and alignment research.

Technology Category

Application Category

📝 Abstract
Do norms of rationality apply to machine learning models, in particular language models? In this paper we investigate this question by focusing on a special subset of rational norms: coherence norms. We consider both logical coherence norms as well as coherence norms tied to the strength of belief. To make sense of the latter, we introduce the Minimal Assent Connection (MAC) and propose a new account of credence, which captures the strength of belief in language models. This proposal uniformly assigns strength of belief simply on the basis of model internal next token probabilities. We argue that rational norms tied to coherence do apply to some language models, but not to others. This issue is significant since rationality is closely tied to predicting and explaining behavior, and thus it is connected to considerations about AI safety and alignment, as well as understanding model behavior more generally.
Problem

Research questions and friction points this paper is trying to address.

Investigating whether language models adhere to rational coherence norms
Proposing new methods to measure belief strength in language models
Assessing implications for AI safety and model behavior understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Minimal Assent Connection for coherence norms
Assigns belief strength using token probabilities
Applies rational norms selectively to language models
🔎 Similar Papers
No similar papers found.