π€ AI Summary
This study addresses the unclear mechanisms by which large language models dynamically update their beliefs and navigate the underlying structure of their hypothesis space during in-context learning. Framing in-context learning as trajectory evolution within a low-dimensional conceptual belief space, the authors investigate story understanding through behavioral experiments, representational analyses, linear probing, and causal interventions. They demonstrate that belief updates occur on a structured low-dimensional manifold, with consistent geometric signatures observed both in behavioral responses and internal model representations. Furthermore, the work shows that targeted manipulations of internal representations enable predictable steering of belief trajectories, providing the first empirical validation of the low-dimensionality and intervenability of belief dynamics in large language models.
π Abstract
Large Language Models (LLMs) update their behavior in context, which can be viewed as a form of Bayesian inference. However, the structure of the latent hypothesis space over which this inference operates remains unclear. In this work, we propose that LLMs assign beliefs over a low-dimensional geometric space - a conceptual belief space - and that in-context learning corresponds to a trajectory through this space as beliefs are updated over time. Using story understanding as a natural setting for dynamic belief updating, we combine behavioral and representational analyses to study these trajectories. We find that (1) belief updates are well-described as trajectories on low-dimensional, structured manifolds; (2) this structure is reflected consistently in both model behavior and internal representations and can be decoded with simple linear probes to predict behavior; and (3) interventions on these representations causally steer belief trajectories, with effects that can be predicted from the geometry of the conceptual space. Together, our results provide a geometric account of belief dynamics in LLMs, grounding Bayesian interpretations of in-context learning in structured conceptual representations.