A Single Revision Step Improves Token-Efficient LLM Reasoning

πŸ“… 2026-02-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large language models are prone to high-confidence hallucinatory reasoning paths that suppress correct answers during complex reasoning. This work proposes a training-free, inference-stage framework that introduces, for the first time, a β€œpeer review” mechanism among reasoning trajectories. By constructing a consensus package that integrates candidate answers, confidence scores, and representative reasoning summaries, the method enables each trajectory to perform conditional self-evaluation and structured revision. This approach elevates simple voting to collaborative logical refinement, achieving or surpassing the accuracy of 256-sample majority voting with just a single correction step on challenging mathematical benchmarks such as AIME and BRUMO. The method significantly outperforms conventional ensembling while maintaining low token overhead.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) achieve higher accuracy on challenging reasoning tasks by scaling test-time compute through multiple trajectory sampling. However, standard aggregation methods like majority voting or individual confidence-based filtering face a fundamental"blind spot": they evaluate each trace in isolation. As problems scale in difficulty, models often generate hallucinated paths that exhibit misleadingly high confidence, causing the true solution to be suppressed by a narrow margin in traditional voting. We ask: can we enable traces to"peer-review"each other to resolve these near-miss errors? We introduce Packet-Conditioned Revision (PACER), a training-free, inference-only framework that enables reasoning traces to revise their conclusions through a structured coordination step. After a preliminary screening of generated traces, PACER constructs a compact consensus packet containing (i) unique candidate answers, (ii) their aggregated confidence scores, and (iii) representative reasoning summaries for each candidate answer. Individual traces then perform a targeted self-review conditioned on this packet, allowing them to identify specific logical junctions where they diverged from the broader consensus and pivot if their original reasoning is found to be flawed. Final predictions are obtained via confidence-weighted voting over these revised trajectories. On challenging competitive math benchmarks such as AIME and BRUMO, PACER matches or exceeds the accuracy of 256-sample majority voting, significantly outperforming raw ensemble baselines by transforming simple consensus into a collaborative logical refinement process.
Problem

Research questions and friction points this paper is trying to address.

LLM reasoning
hallucination
majority voting
near-miss errors
test-time compute
Innovation

Methods, ideas, or system contributions that make the work stand out.

Packet-Conditioned Revision
LLM reasoning
consensus-based revision
token-efficient inference
self-correction
πŸ”Ž Similar Papers
No similar papers found.