Language models can learn implicit multi-hop reasoning, but only if they have lots of training data

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates whether language models can implicitly perform k-hop (k=2,3,4) multi-hop reasoning in a single forward pass—without chain-of-thought prompting. Method: We train GPT-2 architectures from scratch on controlled synthetic datasets to systematically assess feasibility and scaling behavior. Contribution/Results: We discover, for the first time, that implicit k-hop reasoning requires training data growing exponentially with hop count (∝cᵏ) and model depth increasing linearly (∝k). Curriculum learning reduces sample requirements for 4-hop tasks by ~40%, yet fails to overcome the fundamental exponential data bottleneck. A theoretical analysis attributes this phenomenon to the coupling of combinatorial path explosion in multi-hop reasoning and constrained inter-layer information propagation in Transformers. Our results confirm that implicit multi-hop reasoning is learnable in principle, but subject to an inherent trade-off between data efficiency and model scale—imposing hard limits on practical deployment.

Technology Category

Application Category

📝 Abstract

Implicit reasoning is the ability of a language model to solve multi-hop reasoning tasks in a single forward pass, without chain of thought. We investigate this capability using GPT2-style language models trained from scratch on controlled $k$-hop reasoning datasets ($k = 2, 3, 4$). We show that while such models can indeed learn implicit $k$-hop reasoning, the required training data grows exponentially in $k$, and the required number of transformer layers grows linearly in $k$. We offer a theoretical explanation for why this depth growth is necessary. We further find that the data requirement can be mitigated, but not eliminated, through curriculum learning.

Problem

Research questions and friction points this paper is trying to address.

Investigates implicit multi-hop reasoning in language models

Examines exponential data growth for k-hop reasoning tasks

Explores curriculum learning to mitigate data requirements

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT2-style models learn implicit multi-hop reasoning

Exponential data growth required with hop count

Curriculum learning mitigates data requirements partially

🔎 Similar Papers

No similar papers found.

Authors to Follow