🤖 AI Summary
Large reasoning models rely on lengthy chains of thought for complex tasks, incurring substantial computational overhead. This work proposes a training-free dynamic truncation method that leverages the entropy of the output distribution as a reliable indicator of reasoning confidence, enabling early termination of redundant steps when confidence is high. Integrating an adaptive truncation mechanism with a unified evaluation metric—Efficiency-Performance Ratio (EPR)—the approach reduces token consumption by up to 40% across four benchmarks while incurring only minimal accuracy degradation. The method significantly outperforms existing training-free optimization strategies, offering an effective balance between computational efficiency and task performance.
📝 Abstract
Large Reasoning Models (LRMs) excel at complex reasoning tasks through extended chain-of-thought generation, but their reliance on lengthy intermediate steps incurs substantial computational cost. We find that the entropy of the model's output distribution in early reasoning steps reliably distinguishes correct from incorrect reasoning. Motivated by this observation, we propose EntroCut, a training-free method that dynamically truncates reasoning by identifying high-confidence states where reasoning can be safely terminated. To comprehensively evaluate the trade-off between efficiency and accuracy, we introduce the Efficiency-Performance Ratio (EPR), a unified metric that quantifies relative token savings per unit accuracy loss. Experiments on four benchmarks show that EntroCut reduces token usage by up to 40\% with minimal accuracy sacrifice, achieving superior efficiency-performance trade-offs compared with existing training-free methods. These results demonstrate that entropy-guided dynamic truncation provides a practical approach to mitigate the inefficiency of LRMs.