🤖 AI Summary
This study investigates how context influences the vector representations of statement truthfulness in large language models (LLMs), with a focus on geometric changes in direction and magnitude within the activation space. Through residual stream analysis across multiple models and datasets, we systematically characterize the geometric transformations of truth vectors before and after contextual modulation, revealing for the first time consistent patterns in directional angles (θ) and relative magnitudes. Our findings show that context generally amplifies the magnitude of truth vectors, that larger models primarily rely on directional shifts to distinguish relevant contexts, and that conflicting contexts induce more pronounced geometric perturbations. This work offers a novel geometric perspective on the internal mechanisms by which LLMs represent truthfulness.
📝 Abstract
Large Language Models (LLMs) often encode whether a statement is true as a vector in their residual stream activations. These vectors, also known as truth vectors, have been studied in prior work, however how they change when context is introduced remains unexplored. We study this question by measuring (1) the directional change ($\theta$) between the truth vectors with and without context and (2) the relative magnitude of the truth vectors upon adding context. Across four LLMs and four datasets, we find that (1) truth vectors are roughly orthogonal in early layers, converge in middle layers, and may stabilize or continue increasing in later layers; (2) adding context generally increases the truth vector magnitude, i.e., the separation between true and false representations in the activation space is amplified; (3) larger models distinguish relevant from irrelevant context mainly through directional change ($\theta$), while smaller models show this distinction through magnitude differences. We also find that context conflicting with parametric knowledge produces larger geometric changes than parametrically aligned context. To the best of our knowledge, this is the first work that provides a geometric characterization of how context transforms the truth vector in the activation space of LLMs.