🤖 AI Summary
This work investigates how Transformers acquire implicit multi-hop reasoning capabilities without explicit supervision of intermediate reasoning steps. Method: We train models from scratch in a controlled symbolic environment—built upon atomic triples—and introduce two novel diagnostic tools: cross-query semantic repair and the cosine representation lens. Contribution/Results: We identify a three-stage evolutionary pattern in reasoning capability emergence. Crucially, we establish, for the first time, a strong correlation (r > 0.92) between dynamic cosine clustering in the latent space and reasoning success rate. We further demonstrate that generalization to the second hop critically depends on query-level exposure to specific compositional structures. Our approach enables quantifiable tracking and causal intervention in implicit reasoning mechanisms, substantially enhancing the interpretability and controllability of large language models for multi-hop reasoning.
📝 Abstract
Recent work suggests that large language models (LLMs) can perform multi-hop reasoning implicitly -- producing correct answers without explicitly verbalizing intermediate steps -- but the underlying mechanisms remain poorly understood. In this paper, we study how such implicit reasoning emerges by training transformers from scratch in a controlled symbolic environment. Our analysis reveals a three-stage developmental trajectory: early memorization, followed by in-distribution generalization, and eventually cross-distribution generalization. We find that training with atomic triples is not necessary but accelerates learning, and that second-hop generalization relies on query-level exposure to specific compositional structures. To interpret these behaviors, we introduce two diagnostic tools: cross-query semantic patching, which identifies semantically reusable intermediate representations, and a cosine-based representational lens, which reveals that successful reasoning correlates with the cosine-base clustering in hidden space. This clustering phenomenon in turn provides a coherent explanation for the behavioral dynamics observed across training, linking representational structure to reasoning capability. These findings provide new insights into the interpretability of implicit multi-hop reasoning in LLMs, helping to clarify how complex reasoning processes unfold internally and offering pathways to enhance the transparency of such models.