Your Models Have Thought Enough: Training Large Reasoning Models to Stop Overthinking

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large reasoning models (LRMs) often incur redundant computation and excessively long inference paths due to the absence of early termination mechanisms. To address this, we propose Just-Enough Thinking (JET), a reinforcement learning framework inspired by evidence accumulation models. JET explicitly models the “when-to-stop” decision via trajectory truncation and a length-aware reward that penalizes unnecessary reasoning steps while rewarding high-quality intermediate reasoning outcomes. Crucially, JET enables models to learn dynamic, input-adaptive stopping policies end-to-end during training—without requiring post-hoc heuristics or external supervision. On challenging reasoning benchmarks including Olympiad, JET reduces average output length by 46.3% while improving accuracy by 4.6%, marking the first demonstration of concurrent gains in both inference efficiency and reasoning fidelity. This establishes JET as a learnable, generalizable paradigm for adaptive termination in deep reasoning systems.

Technology Category

Application Category

📝 Abstract
Large Reasoning Models (LRMs) have achieved impressive performance on challenging tasks, yet their deep reasoning often incurs substantial computational costs. To achieve efficient reasoning, existing reinforcement learning methods still struggle to construct short reasoning path during the rollout stage, limiting effective learning. Inspired by Evidence Accumulation Models, we find that LRMs have accumulated sufficient information early in reasoning, making further reasoning steps redundant. Based on this insight, we propose Just-Enough Thinking (JET), which trains models to proactively terminate unnecessary reasoning. JET performs trajectory truncation during rollout to expose the model to short, distributionally consistent reasoning paths. Besides, it uses a quality-controlled length reward to better encourage concise reasoning while maintaining correctness. Extensive experiments demonstrate that JET significantly improves reasoning efficiency without sacrificing accuracy. Especially, DeepSeek-Distill-Qwen-1.5B achieves a 4.6% accuracy gain while reducing output length by 46.3% on the Olympiad benchmark. Our code is available in the GitHub.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs in deep reasoning models
Training models to terminate unnecessary reasoning steps
Improving reasoning efficiency without sacrificing accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trains models to proactively terminate unnecessary reasoning steps
Performs trajectory truncation for distributionally consistent short paths
Uses quality-controlled length reward to maintain correctness
🔎 Similar Papers
No similar papers found.
Jinyi Han
Jinyi Han
Knowledge Works Lab
Large Language Model
Y
Ying Huang
School of Data Science, Fudan University
Ying Liao
Ying Liao
Associate Professor of Operations and Supply Chain Management, East Carolina University
supply chain strategy & managementknowledge managementand innovation management.
Z
Zishang Jiang
School of Data Science, Fudan University
X
Xikun Lu
Shanghai Institute of Artificial Intelligence for Education, East China Normal University
Haiquan Zhao
Haiquan Zhao
Alibaba Group
LLM Safety
X
Xinyi Wang
School of Data Science, Fudan University
G
Guanghao Zhou
Shanghai Institute of Artificial Intelligence for Education, East China Normal University
Sihang Jiang
Sihang Jiang
Fudan University
Knowledge GraphLarge Language Models
Jiaqing Liang
Jiaqing Liang
Fudan University
knowledge graphdeep learning
W
Weikang Zhou
Antgroup
Z
Zeye Sun
Antgroup
F
Fei Yu
Antgroup
Y
Yanghua Xiao
College of Computer Science and Artificial Intelligence, Fudan University