Language models can learn implicit multi-hop reasoning, but only if they have lots of training data

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether language models can implicitly perform k-hop (k=2,3,4) multi-hop reasoning in a single forward pass—without chain-of-thought prompting. Method: We train GPT-2 architectures from scratch on controlled synthetic datasets to systematically assess feasibility and scaling behavior. Contribution/Results: We discover, for the first time, that implicit k-hop reasoning requires training data growing exponentially with hop count (∝cᵏ) and model depth increasing linearly (∝k). Curriculum learning reduces sample requirements for 4-hop tasks by ~40%, yet fails to overcome the fundamental exponential data bottleneck. A theoretical analysis attributes this phenomenon to the coupling of combinatorial path explosion in multi-hop reasoning and constrained inter-layer information propagation in Transformers. Our results confirm that implicit multi-hop reasoning is learnable in principle, but subject to an inherent trade-off between data efficiency and model scale—imposing hard limits on practical deployment.

Technology Category

Application Category

📝 Abstract
Implicit reasoning is the ability of a language model to solve multi-hop reasoning tasks in a single forward pass, without chain of thought. We investigate this capability using GPT2-style language models trained from scratch on controlled $k$-hop reasoning datasets ($k = 2, 3, 4$). We show that while such models can indeed learn implicit $k$-hop reasoning, the required training data grows exponentially in $k$, and the required number of transformer layers grows linearly in $k$. We offer a theoretical explanation for why this depth growth is necessary. We further find that the data requirement can be mitigated, but not eliminated, through curriculum learning.
Problem

Research questions and friction points this paper is trying to address.

Investigates implicit multi-hop reasoning in language models
Examines exponential data growth for k-hop reasoning tasks
Explores curriculum learning to mitigate data requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT2-style models learn implicit multi-hop reasoning
Exponential data growth required with hop count
Curriculum learning mitigates data requirements partially
🔎 Similar Papers
No similar papers found.
Y
Yuekun Yao
Saarland University
Y
Yupei Du
Utrecht University
D
Dawei Zhu
Saarland University
M
Michael Hahn
Saarland University
Alexander Koller
Alexander Koller
Professor of Computational Linguistics, Saarland University, Saarland Informatics Campus
Computational linguisticsartificial intelligence