The Dual-Route Model of Induction

📅 2025-04-03

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

This work investigates the fine-grained mechanisms underlying context copying in large language models (LLMs), addressing how models reproduce input tokens or higher-level semantic units. We propose a dual-path induction architecture—comprising *token-level* and *concept-level* induction pathways—identified and validated through attention analysis, head localization, and targeted ablation experiments. We discover two functionally dissociated induction heads: token-level heads perform precise, per-token copying to ensure literal fidelity; concept-level heads detect and reuse multi-token lexical units (e.g., phrases or named entities) to support semantic tasks such as word-level translation. Ablation results demonstrate their operational independence and functional specialization: removing the token-level pathway shifts behavior from verbatim copying to paraphrasing, underscoring the concept-level pathway’s dominant role in semantic generalization. This study establishes a novel analytical paradigm for probing internal induction mechanisms in LLMs.

Technology Category

Application Category

📝 Abstract

Prior work on in-context copying has shown the existence of induction heads, which attend to and promote individual tokens during copying. In this work we introduce a new type of induction head: concept-level induction heads, which copy entire lexical units instead of individual tokens. Concept induction heads learn to attend to the ends of multi-token words throughout training, working in parallel with token-level induction heads to copy meaningful text. We show that these heads are responsible for semantic tasks like word-level translation, whereas token induction heads are vital for tasks that can only be done verbatim, like copying nonsense tokens. These two"routes"operate independently: in fact, we show that ablation of token induction heads causes models to paraphrase where they would otherwise copy verbatim. In light of these findings, we argue that although token induction heads are vital for specific tasks, concept induction heads may be more broadly relevant for in-context learning.

Problem

Research questions and friction points this paper is trying to address.

Introduces concept-level induction heads for copying lexical units

Compares roles of token vs concept heads in semantic tasks

Shows independent operation of dual-route induction mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept-level induction heads copy lexical units

Parallel operation with token-level induction heads

Independent routes for semantic and verbatim tasks

🔎 Similar Papers

No similar papers found.