CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model inference often relies on extensive parallel sampling to improve accuracy, incurring substantial computational costs. This work proposes a confidence-guided self-optimizing reasoning mechanism that employs a lightweight Conv1D controller (only 211k parameters) to dynamically decide whether to terminate, re-evaluate, or explore new reasoning paths. Crucially, confidence is treated as a control signal rather than a guarantee of correctness, enabling modular and scalable self-correction. The approach supports a hybrid serial-parallel strategy compatible with external verifiers, achieving high efficiency across multiple reasoning benchmarks: it requires an average of only 2.7 optimization steps, reduces token consumption by approximately 190×, and attains an accuracy of 92.6% when terminating based on confidence.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) often rely on test-time scaling via parallel decoding (for example, 512 samples) to boost reasoning accuracy, but this incurs substantial compute. We introduce CoRefine, a confidence-guided self-refinement method that achieves competitive accuracy using a fraction of the tokens via a lightweight 211k-parameter Conv1D controller atop a frozen LLM. The controller consumes full-trace confidence to decide whether to halt, re-examine, or try a different approach, enabling targeted self-correction with an average of 2.7 refinement steps per problem and roughly 190-fold token reduction relative to 512-sample baselines. Across diverse reasoning benchmarks and three open-source models, the controller achieves 92.6 percent precision when it confidently halts, indicating that confidence dynamics reliably signal correctness without ground-truth verification. We extend this to CoRefine-Tree, a hybrid sequential-parallel variant that adaptively balances exploration and exploitation, with easy serving integration and verifier compatibility. By treating confidence as a control signal rather than a correctness guarantee, CoRefine provides a modular primitive for scalable reasoning and agentic settings with imperfect verifiers.
Problem

Research questions and friction points this paper is trying to address.

test-time compute
reasoning accuracy
token efficiency
large language models
adaptive inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

confidence-guided refinement
test-time compute adaptation
lightweight controller
token-efficient reasoning
self-correction
🔎 Similar Papers
No similar papers found.