🤖 AI Summary
Large reasoning models (LRMs) often suffer from “overthinking,” generating excessively long and inefficient chains of thought (CoT), which increases inference cost and degrades accuracy; empirical analysis reveals that shorter reasoning paths frequently correlate with higher correctness. To address this, we propose Decoding Tree Sketching—a training-free, model-agnostic framework that approximates optimal short reasoning paths within a tree-structured search space. Our method employs entropy-driven selective branching on high-entropy tokens and an early-stopping mechanism, synergistically integrating dynamic pruning and early termination to suppress error propagation and redundant reasoning. Evaluated on the AIME2024/2025 benchmarks, it achieves up to an 8% absolute accuracy gain, reduces average reasoning length by 23%, and cuts token repetition frequency by 12%. This work represents the first general, training-free intervention that efficiently mitigates overthinking in LRMs.
📝 Abstract
Large Reasoning Models (LRMs) demonstrate strong performance on complex reasoning tasks, yet they often suffer from overthinking, producing excessively long chain-of-thought (CoT) traces that increase inference cost and may degrade accuracy. Our analysis reveals a clear anti-correlation between reasoning length and accuracy, where across multiple stochastic decodes, the short reasoning paths consistently achieve the highest correctness, while longer ones accumulate errors and repetitions. These short optimal reasoning paths can be found ideally through full enumeration of the reasoning space. However, the tree-structured reasoning space grows exponentially with sequence length, rendering exhaustive exploration infeasible. To address this, we propose DTS, a model-agnostic decoding framework that sketches the reasoning space by selectively branching at high-entropy tokens and applies early stopping to select the shortest completed reasoning path. This approach approximates the optimal solution that enhances both efficiency and accuracy, without requiring additional training or supervision. Experiments on AIME2024 and AIME2025 datasets with DeepSeek-R1-Distill-Qwen-7B and 1.5B show that DTS improves accuracy by up to 8%, reduces average reasoning length by 23%, and decreases repetition frequency by 12%, demonstrating DTS's ability for scalable and efficient LRM reasoning.