🤖 AI Summary
This study investigates whether large language models rely on structured internal representations to support flexible reasoning in context. Through causal mediation analysis, representational similarity analysis, and functional dissection of attention heads, the research reveals that a cross-contextually stable semantic subspace emerges in the middle-to-late layers of the model. Early-to-mid attention heads integrate contextual cues to construct this subspace, while later layers leverage it to generate predictions. The work provides the first causal evidence demonstrating the functional necessity and causal role of this structured representation in reasoning, elucidating its dynamic construction and utilization mechanisms. These findings offer critical insights into how large language models achieve sophisticated reasoning capabilities.
📝 Abstract
Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning. While recent work has identified structured, human-like conceptual representations within these models, it remains unclear whether they functionally rely on such representations for reasoning. Here we investigate the internal processing of LLMs during in-context concept inference. Our results reveal a conceptual subspace emerging in middle to late layers, whose representational structure persists across contexts. Using causal mediation analyses, we demonstrate that this subspace is not merely an epiphenomenon but is functionally central to model predictions, establishing its causal role in inference. We further identify a layer-wise progression where attention heads in early-to-middle layers integrate contextual cues to construct and refine the subspace, which is subsequently leveraged by later layers to generate predictions. Together, these findings provide evidence that LLMs dynamically construct and use structured, latent representations in context for inference, offering insights into the computational processes underlying flexible adaptation.