A circuit for predicting hierarchical structure in-context in Large Language Models

๐Ÿ“… 2025-09-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Standard induction heads in large language models (LLMs) operate context-independently, limiting their ability to model hierarchical, higher-order repetition patternsโ€”such as English noun selection after โ€œtheโ€, which depends on multi-level contextual constraints rather than simple historical co-occurrence. Method: We propose and empirically validate an *adaptive induction circuit* mechanism, wherein induction heads dynamically learn context-dependent relationships by integrating local and global contextual cues to support hierarchical prediction. Our approach combines synthetic hierarchical sequence tasks, fine-grained attention analysis, context-sensitivity probing, and natural-language analogy experiments. Contribution/Results: Empirical evaluation demonstrates that this mechanism significantly improves predictive accuracy on both synthetic benchmarks and real-world language data. Crucially, it provides the first interpretable circuit-level evidence that LLMs can adaptively model structured repetition patterns through mechanistically identifiable, context-sensitive induction circuits.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) excel at in-context learning, the ability to use information provided as context to improve prediction of future tokens. Induction heads have been argued to play a crucial role for in-context learning in Transformer Language Models. These attention heads make a token attend to successors of past occurrences of the same token in the input. This basic mechanism supports LLMs' ability to copy and predict repeating patterns. However, it is unclear if this same mechanism can support in-context learning of more complex repetitive patterns with hierarchical structure. Natural language is teeming with such cases: The article "the" in English usually prefaces multiple nouns in a text. When predicting which token succeeds a particular instance of "the", we need to integrate further contextual cues from the text to predict the correct noun. If induction heads naively attend to all past instances of successor tokens of "the" in a context-independent manner, they cannot support this level of contextual information integration. In this study, we design a synthetic in-context learning task, where tokens are repeated with hierarchical dependencies. Here, attending uniformly to all successor tokens is not sufficient to accurately predict future tokens. Evaluating a range of LLMs on these token sequences and natural language analogues, we find adaptive induction heads that support prediction by learning what to attend to in-context. Next, we investigate how induction heads themselves learn in-context. We find evidence that learning is supported by attention heads that uncover a set of latent contexts, determining the different token transition relationships. Overall, we not only show that LLMs have induction heads that learn, but offer a complete mechanistic account of how LLMs learn to predict higher-order repetitive patterns in-context.
Problem

Research questions and friction points this paper is trying to address.

Investigating induction heads' ability to handle hierarchical dependencies
Determining if LLMs can learn complex repetitive patterns in-context
Explaining how adaptive induction heads integrate contextual cues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive induction heads learn contextual attention patterns
Attention heads uncover latent contexts for transitions
Mechanistic account of hierarchical pattern prediction learning
๐Ÿ”Ž Similar Papers
No similar papers found.