🤖 AI Summary
This work addresses the inefficiency of large reasoning models that often continue generating redundant reasoning steps even after arriving at the correct answer, leading to unnecessary computational overhead. To mitigate this, the authors propose an efficient early-stopping mechanism that enables token-level control over the reasoning process. The approach combines a trajectory classifier to identify valid termination points with supervised fine-tuning using self-generated <stop> tokens and stop-aware reinforcement learning driven by computation-aware rewards. Evaluated on four reasoning benchmarks, the method reduces average reasoning length by 3.7× (from 4799 to 1290 tokens) while maintaining stable accuracy (74.9% vs. 74.2%) and demonstrates strong cross-domain generalization capabilities.
📝 Abstract
Large reasoning models (LRMs) achieve state-of-the-art performance by generating long chains-of-thought, but often waste computation on redundant reasoning after the correct answer has already been reached. We introduce Early-Stopping for Token-Aware Reasoning (ESTAR), which detects and reduces such reasoning redundancy to improve efficiency without sacrificing accuracy. Our method combines (i) a trajectory-based classifier that identifies when reasoning can be safely stopped, (ii) supervised fine-tuning to teach LRMs to propose self-generatedsignals, and (iii)-aware reinforcement learning that truncates rollouts at self-generated stop points with compute-aware rewards. Experiments on four reasoning datasets show that ESTAR reduces reasoning length by about 3.7x (from 4,799 to 1,290) while preserving accuracy (74.9% vs. 74.2%), with strong cross-domain generalization. These results highlight early stopping as a simple yet powerful mechanism for improving reasoning efficiency in LRMs.