Implicit Reasoning in Transformers is Reasoning through Shortcuts

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

160K/year
🤖 AI Summary
This work investigates the root cause of generalization failure in large language models (LLMs) on implicit reasoning—particularly multi-step mathematical tasks. Methodologically, we train GPT-2 from scratch and construct a custom multi-step reasoning dataset to systematically analyze learning dynamics. We find that implicit reasoning degenerates into shortcut learning: models achieve high in-domain and out-of-domain accuracy (>90%) when trained on fixed input-output patterns, yet collapse under minor pattern perturbations; by contrast, non-fixed patterns induce severe overfitting. Cross-model validation confirms this shortcut dependence is pervasive across state-of-the-art LLMs. Our key contributions are threefold: (1) the first explicit identification of shortcut learning as the core mechanism underlying implicit reasoning failure; (2) establishment of a causal link between pattern stability and generalization capability; and (3) proposal of a novel evaluation paradigm for implicit reasoning grounded in controllable data pattern design.

Technology Category

Application Category

📝 Abstract
Test-time compute is emerging as a new paradigm for enhancing language models' complex multi-step reasoning capabilities, as demonstrated by the success of OpenAI's o1 and o3, as well as DeepSeek's R1. Compared to explicit reasoning in test-time compute, implicit reasoning is more inference-efficient, requiring fewer generated tokens. However, why does the advanced reasoning capability fail to emerge in the implicit reasoning style? In this work, we train GPT-2 from scratch on a curated multi-step mathematical reasoning dataset and conduct analytical experiments to investigate how language models perform implicit reasoning in multi-step tasks. Our findings reveal: 1) Language models can perform step-by-step reasoning and achieve high accuracy in both in-domain and out-of-domain tests via implicit reasoning. However, this capability only emerges when trained on fixed-pattern data. 2) Conversely, implicit reasoning abilities emerging from training on unfixed-pattern data tend to overfit a specific pattern and fail to generalize further. Notably, this limitation is also observed in state-of-the-art large language models. These findings suggest that language models acquire implicit reasoning through shortcut learning, enabling strong performance on tasks with similar patterns while lacking generalization.
Problem

Research questions and friction points this paper is trying to address.

Investigates why implicit reasoning fails in language models.
Explores shortcut learning in multi-step reasoning tasks.
Examines generalization issues in fixed vs. unfixed-pattern training.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit reasoning enhances inference efficiency
Training on fixed-pattern data improves generalization
Shortcut learning limits reasoning generalization