The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models'Posteriors

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of mechanistic understanding regarding how beliefs—formalized as posterior distributions—are encoded, updated, and influenced by interventions within the representation space of large language models (LLMs). Focusing on Llama-3.2, we study a contextual inference task in which the model implicitly estimates parameters of a normal distribution, enabling systematic analysis of the emergent “belief manifold” in its representational geometry and dynamics. We introduce Linear Field Probing (LFP) to uncover the underlying nonlinear manifold and design geometrically and field-aware intervention strategies. Experiments demonstrate that conventional linear steering often pushes representations off the belief manifold, yielding unnatural shifts, whereas interventions grounded in the manifold’s intrinsic geometry successfully preserve the structural coherence of the belief family, providing evidence for structured and geometrically consistent belief representations within LLMs.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) represent prompt-conditioned beliefs (posteriors over answers and claims), but we lack a mechanistic account of how these beliefs are encoded in representation space, how they update with new evidence, and how interventions reshape them. We study a controlled setting in which Llama-3.2 generates samples from a normal distribution by implicitly inferring its parameters (mean and standard deviation) given only samples from the distribution in context. We find representations of curved"belief manifolds"for these parameters form with sufficient in-context learning and study how the model adapts when the distribution suddenly changes. While standard linear steering often pushes the model off-manifold and induces coupled, out-of-distribution shifts, geometry and field-aware steering better preserves the intended belief family. Our work demonstrates an example of linear field probing (LFP) as a simple approach to tile the data manifold and make interventions that respect the underlying geometry. We conclude that rich structure emerges naturally in LLMs and that purely linear concept representations are often an inadequate abstraction.
Problem

Research questions and friction points this paper is trying to address.

belief representation
posterior geometry
language models
representation manifolds
belief updating
Innovation

Methods, ideas, or system contributions that make the work stand out.

belief manifolds
representation geometry
linear field probing
in-context learning
model steering
🔎 Similar Papers
No similar papers found.