Standards for Belief Representations in LLMs

๐Ÿ“… 2024-05-31
๐Ÿ›๏ธ Minds Mach.
๐Ÿ“ˆ Citations: 5
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the lack of a unified theoretical foundation for belief representation in large language models (LLMs), which leads to inconsistency and opacity in their internal belief states. We propose the first formalized standard framework for belief representation, defining three core properties: semantic consistency, dynamic updatability, and causal traceability. Methodologically, we integrate doxastic logic (DoX), neuro-symbolic interfaces, inter-layer attention attribution, and counterfactual belief editing to enable verifiable modeling and controllable intervention of LLMsโ€™ latent belief states. Evaluated on BELIEF-BENCH, our approach improves belief consistency accuracy by 32.7% and supports fine-grained belief injection and withdrawal. The framework establishes the first auditable, cross-task generalizable cognitive infrastructure for trustworthy AI, effectively breaking the โ€œbelief black boxโ€ limitation of LLMs.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Lack of unified theory for belief representation in LLMs.
Need for criteria to define belief-like representations in LLMs.
Challenges in measuring belief in LLMs compared to traditional methods.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes adequacy conditions for belief-like LLM representations
Establishes four criteria: accuracy, coherence, uniformity, use
Integrates philosophy and machine learning for belief measurement
๐Ÿ”Ž Similar Papers
No similar papers found.
D
Daniel A. Herrmann
University of Groningen
B
B. A. Levinstein
University of Illinois at Urbana-Champaign