The Virtues of Brevity: Avoid Overthinking in Parallel Test-Time Reasoning

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from “overthinking” during parallel decoding—generating verbose, hesitant reasoning paths that degrade both inference efficiency and output confidence. Method: We propose a lightweight solution: selecting the shortest generated answer among parallel samples. This strategy is grounded in empirical observation that high-quality reasoning paths tend to be more concise and confident, with output length serving as an implicit proxy for confidence—requiring no additional scoring modules or model fine-tuning. Contribution/Results: Evaluated on challenging benchmarks—including mathematical reasoning (GSM8K, MATH) and program synthesis (HumanEval)—our length-based selection matches or exceeds self-consistency in accuracy while reducing computational overhead significantly. The approach is inherently compatible with unstructured and multi-format outputs, offering a parameter-free, plug-and-play improvement to parallel decoding in LLMs.

Technology Category

Application Category

📝 Abstract
Reasoning models represent a significant advance in LLM capabilities, particularly for complex reasoning tasks such as mathematics and coding. Previous studies confirm that parallel test-time compute-sampling multiple solutions and selecting the best one-can further enhance the predictive performance of LLMs. However, strategies in this area often require complex scoring, thus increasing computational cost and complexity. In this work, we demonstrate that the simple and counterintuitive heuristic of selecting the shortest solution is highly effective. We posit that the observed effectiveness stems from models operating in two distinct regimes: a concise, confident conventional regime and a verbose overthinking regime characterized by uncertainty, and we show evidence of a critical point where the overthinking regime begins to be significant. By selecting the shortest answer, the heuristic preferentially samples from the conventional regime. We confirm that this approach is competitive with more complex methods such as self-consistency across two challenging benchmarks while significantly reducing computational overhead. The shortest-answer heuristic provides a Pareto improvement over self-consistency and applies even to tasks where output equality is not well defined.
Problem

Research questions and friction points this paper is trying to address.

Reduces computational cost in parallel reasoning methods
Addresses overthinking in test-time reasoning models
Simplifies solution selection using shortest-answer heuristic
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selects shortest solution to reduce complexity
Identifies critical point for overthinking regime
Provides Pareto improvement over self-consistency
R
Raul Cavalcante Dinardi
Instituto de Matemática, Estatística e Ciência da Computação, Universidade de São Paulo
B
Bruno Yamamoto
Escola Politécnica, Universidade de São Paulo
Anna Helena Reali Costa
Anna Helena Reali Costa
Full Professor of Computer Engineering, Universidade de São Paulo
Artificial IntelligenceMachine LearningReinforcement LearningIntelligent Robotics
Artur Jordao
Artur Jordao
Universidade de São Paulo (USP)
Machine LearningPartial Least SquaresPattern Recognition