TERMINATOR: Learning Optimal Exit Points for Early Stopping in Chain-of-Thought Reasoning

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of large reasoning models in chain-of-thought (CoT) reasoning, where excessive deliberation often leads to wasted computational resources due to the lack of an automatic mechanism for determining the optimal stopping point. To this end, we propose TERMINATOR, a task- and model-adaptive early-stopping strategy that dynamically truncates redundant reasoning by predicting the position at which the correct answer is first generated. We construct the first dataset of optimal reasoning lengths based on the initial occurrence of correct answers and train an early-stopping controller using CoT trajectory analysis and supervised learning. Evaluated on four benchmarks—MATH-500, AIME 2025, HumanEval, and GPQA—TERMINATOR reduces reasoning length by 14%–55% on average, significantly outperforming existing approaches.

Technology Category

Application Category

📝 Abstract
Large Reasoning Models (LRMs) achieve impressive performance on complex reasoning tasks via Chain-of-Thought (CoT) reasoning, which enables them to generate intermediate thinking tokens before arriving at the final answer. However, LRMs often suffer from significant overthinking, spending excessive compute time even after the answer is generated early on. Prior work has identified the existence of an optimal reasoning length such that truncating reasoning at this point significantly shortens CoT outputs with virtually no change in performance. However, determining optimal CoT lengths for practical datasets is highly non-trivial as they are fully task and model-dependent. In this paper, we precisely address this and design TERMINATOR, an early-exit strategy for LRMs at inference to mitigate overthinking. The central idea underpinning TERMINATOR is that the first arrival of an LRM's final answer is often predictable, and we leverage these first answer positions to create a novel dataset of optimal reasoning lengths to train TERMINATOR. Powered by this approach, TERMINATOR achieves significant reductions in CoT lengths of 14%-55% on average across four challenging practical datasets: MATH-500, AIME 2025, HumanEval, and GPQA, whilst outperforming current state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

early stopping
Chain-of-Thought reasoning
overthinking
optimal reasoning length
Large Reasoning Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

early-exit
Chain-of-Thought reasoning
optimal stopping
Large Reasoning Models
overthinking mitigation
🔎 Similar Papers
No similar papers found.