Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the overreliance of large language models (LLMs) on code execution and complex data structures—thereby undermining core logical reasoning—this paper proposes TeaR, an algorithm-guided framework for reasoning enhancement. TeaR employs fine-grained, algorithm-centric dataset construction and reinforcement learning–driven long-chain reasoning distillation to enable LLMs to autonomously discover optimal inference paths without generating executable code. Crucially, it eliminates dependence on external executors and prioritizes learning and generalizing abstract logical structures. Evaluated across 17 benchmarks spanning mathematical, knowledge-based, coding, and logical reasoning tasks, TeaR achieves consistent improvements: +35.9% on Qwen2.5-7B and +5.9% on R1-Distilled-7B. These gains demonstrate substantially enhanced robustness and cross-task transferability in multi-domain reasoning.

Technology Category

Application Category

📝 Abstract
Enhancing reasoning capabilities remains a central focus in the LLM reasearch community. A promising direction involves requiring models to simulate code execution step-by-step to derive outputs for given inputs. However, as code is often designed for large-scale systems, direct application leads to over-reliance on complex data structures and algorithms, even for simple cases, resulting in overfitting to algorithmic patterns rather than core reasoning structures. To address this, we propose TeaR, which aims at teaching LLMs to reason better. TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks, thereby improving general reasoning abilities. We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning. The results consistently show significant performance improvements. Notably, TeaR achieves a 35.9% improvement on Qwen2.5-7B and 5.9% on R1-Distilled-7B.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning without complex code reliance
Overcoming overfitting to algorithmic patterns in reasoning
Improving general reasoning through curated data and reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for reasoning enhancement
Data curation to avoid overfitting
Code-related tasks for general reasoning
🔎 Similar Papers
No similar papers found.