A Single Revision Step Improves Token-Efficient LLM Reasoning

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Large language models are prone to high-confidence hallucinatory reasoning paths that suppress correct answers during complex reasoning. This work proposes a training-free, inference-stage framework that introduces, for the first time, a “peer review” mechanism among reasoning trajectories. By constructing a consensus package that integrates candidate answers, confidence scores, and representative reasoning summaries, the method enables each trajectory to perform conditional self-evaluation and structured revision. This approach elevates simple voting to collaborative logical refinement, achieving or surpassing the accuracy of 256-sample majority voting with just a single correction step on challenging mathematical benchmarks such as AIME and BRUMO. The method significantly outperforms conventional ensembling while maintaining low token overhead.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) achieve higher accuracy on challenging reasoning tasks by scaling test-time compute through multiple trajectory sampling. However, standard aggregation methods like majority voting or individual confidence-based filtering face a fundamental"blind spot": they evaluate each trace in isolation. As problems scale in difficulty, models often generate hallucinated paths that exhibit misleadingly high confidence, causing the true solution to be suppressed by a narrow margin in traditional voting. We ask: can we enable traces to"peer-review"each other to resolve these near-miss errors? We introduce Packet-Conditioned Revision (PACER), a training-free, inference-only framework that enables reasoning traces to revise their conclusions through a structured coordination step. After a preliminary screening of generated traces, PACER constructs a compact consensus packet containing (i) unique candidate answers, (ii) their aggregated confidence scores, and (iii) representative reasoning summaries for each candidate answer. Individual traces then perform a targeted self-review conditioned on this packet, allowing them to identify specific logical junctions where they diverged from the broader consensus and pivot if their original reasoning is found to be flawed. Final predictions are obtained via confidence-weighted voting over these revised trajectories. On challenging competitive math benchmarks such as AIME and BRUMO, PACER matches or exceeds the accuracy of 256-sample majority voting, significantly outperforming raw ensemble baselines by transforming simple consensus into a collaborative logical refinement process.

Problem

Research questions and friction points this paper is trying to address.

LLM reasoning

hallucination

majority voting

near-miss errors

test-time compute

Innovation

Methods, ideas, or system contributions that make the work stand out.

Packet-Conditioned Revision

LLM reasoning

consensus-based revision