How does Transformer Learn Implicit Reasoning?

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work investigates how Transformers acquire implicit multi-hop reasoning capabilities without explicit supervision of intermediate reasoning steps. Method: We train models from scratch in a controlled symbolic environment—built upon atomic triples—and introduce two novel diagnostic tools: cross-query semantic repair and the cosine representation lens. Contribution/Results: We identify a three-stage evolutionary pattern in reasoning capability emergence. Crucially, we establish, for the first time, a strong correlation (r > 0.92) between dynamic cosine clustering in the latent space and reasoning success rate. We further demonstrate that generalization to the second hop critically depends on query-level exposure to specific compositional structures. Our approach enables quantifiable tracking and causal intervention in implicit reasoning mechanisms, substantially enhancing the interpretability and controllability of large language models for multi-hop reasoning.

Technology Category

Application Category

📝 Abstract

Recent work suggests that large language models (LLMs) can perform multi-hop reasoning implicitly -- producing correct answers without explicitly verbalizing intermediate steps -- but the underlying mechanisms remain poorly understood. In this paper, we study how such implicit reasoning emerges by training transformers from scratch in a controlled symbolic environment. Our analysis reveals a three-stage developmental trajectory: early memorization, followed by in-distribution generalization, and eventually cross-distribution generalization. We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures. To interpret these behaviors, we introduce two diagnostic tools: cross-query semantic patching, which identifies semantically reusable intermediate representations, and a cosine-based representational lens, which reveals that successful reasoning correlates with the cosine-base clustering in hidden space. This clustering phenomenon in turn provides a coherent explanation for the behavioral dynamics observed across training, linking representational structure to reasoning capability. These findings provide new insights into the interpretability of implicit multi-hop reasoning in LLMs, helping to clarify how complex reasoning processes unfold internally and offering pathways to enhance the transparency of such models.

Problem

Research questions and friction points this paper is trying to address.

How transformers learn implicit multi-hop reasoning

Mechanisms behind LLMs' reasoning without verbalized steps

Developmental stages and tools for interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training transformers in controlled symbolic environment

Cross-query semantic patching for intermediate representations

Cosine-based clustering reveals reasoning capability

🔎 Similar Papers

The Buffer Mechanism for Multi-Step Information Reasoning in Language Models