EntroCut: Entropy-Guided Adaptive Truncation for Efficient Chain-of-Thought Reasoning in Small-scale Large Reasoning Models

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Large reasoning models rely on lengthy chains of thought for complex tasks, incurring substantial computational overhead. This work proposes a training-free dynamic truncation method that leverages the entropy of the output distribution as a reliable indicator of reasoning confidence, enabling early termination of redundant steps when confidence is high. Integrating an adaptive truncation mechanism with a unified evaluation metric—Efficiency-Performance Ratio (EPR)—the approach reduces token consumption by up to 40% across four benchmarks while incurring only minimal accuracy degradation. The method significantly outperforms existing training-free optimization strategies, offering an effective balance between computational efficiency and task performance.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) excel at complex reasoning tasks through extended chain-of-thought generation, but their reliance on lengthy intermediate steps incurs substantial computational cost. We find that the entropy of the model's output distribution in early reasoning steps reliably distinguishes correct from incorrect reasoning. Motivated by this observation, we propose EntroCut, a training-free method that dynamically truncates reasoning by identifying high-confidence states where reasoning can be safely terminated. To comprehensively evaluate the trade-off between efficiency and accuracy, we introduce the Efficiency-Performance Ratio (EPR), a unified metric that quantifies relative token savings per unit accuracy loss. Experiments on four benchmarks show that EntroCut reduces token usage by up to 40\% with minimal accuracy sacrifice, achieving superior efficiency-performance trade-offs compared with existing training-free methods. These results demonstrate that entropy-guided dynamic truncation provides a practical approach to mitigate the inefficiency of LRMs.

Problem

Research questions and friction points this paper is trying to address.

Large Reasoning Models

Chain-of-Thought Reasoning

Computational Efficiency

Token Usage

Reasoning Truncation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-Guided Truncation

Chain-of-Thought Reasoning

Large Reasoning Models