The Virtues of Brevity: Avoid Overthinking in Parallel Test-Time Reasoning

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Large language models (LLMs) suffer from “overthinking” during parallel decoding—generating verbose, hesitant reasoning paths that degrade both inference efficiency and output confidence. Method: We propose a lightweight solution: selecting the shortest generated answer among parallel samples. This strategy is grounded in empirical observation that high-quality reasoning paths tend to be more concise and confident, with output length serving as an implicit proxy for confidence—requiring no additional scoring modules or model fine-tuning. Contribution/Results: Evaluated on challenging benchmarks—including mathematical reasoning (GSM8K, MATH) and program synthesis (HumanEval)—our length-based selection matches or exceeds self-consistency in accuracy while reducing computational overhead significantly. The approach is inherently compatible with unstructured and multi-format outputs, offering a parameter-free, plug-and-play improvement to parallel decoding in LLMs.

Technology Category

Application Category

📝 Abstract

Reasoning models represent a significant advance in LLM capabilities, particularly for complex reasoning tasks such as mathematics and coding. Previous studies confirm that parallel test-time compute-sampling multiple solutions and selecting the best one-can further enhance the predictive performance of LLMs. However, strategies in this area often require complex scoring, thus increasing computational cost and complexity. In this work, we demonstrate that the simple and counterintuitive heuristic of selecting the shortest solution is highly effective. We posit that the observed effectiveness stems from models operating in two distinct regimes: a concise, confident conventional regime and a verbose overthinking regime characterized by uncertainty, and we show evidence of a critical point where the overthinking regime begins to be significant. By selecting the shortest answer, the heuristic preferentially samples from the conventional regime. We confirm that this approach is competitive with more complex methods such as self-consistency across two challenging benchmarks while significantly reducing computational overhead. The shortest-answer heuristic provides a Pareto improvement over self-consistency and applies even to tasks where output equality is not well defined.

Problem

Research questions and friction points this paper is trying to address.

Reduces computational cost in parallel reasoning methods

Addresses overthinking in test-time reasoning models

Simplifies solution selection using shortest-answer heuristic

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selects shortest solution to reduce complexity

Identifies critical point for overthinking regime

Provides Pareto improvement over self-consistency

🔎 Similar Papers

TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark