One-Token Verification for Reasoning Correctness Estimation

📅 2026-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high latency of existing reasoning methods that rely on multiple sampling and the absence of an efficient, reliable mechanism for assessing correctness within a single decoding trajectory. The authors propose a method to estimate token-level reasoning correctness in real time during a single forward pass by introducing learnable verification tokens and key-value cache probes, enabling immediate reliability judgments at any generation step without additional inference overhead. This approach facilitates a correctness-predictive early-stopping strategy. When combined with LoRA fine-tuning, the method outperforms current verifiers on mathematical reasoning benchmarks and achieves up to a 90% reduction in token usage through early stopping, prioritizing the generation of shorter yet more reliable answers.

Technology Category

Application Category

📝 Abstract
Recent breakthroughs in large language models (LLMs) have led to notable successes in complex reasoning tasks, such as mathematical problem solving. A common strategy for improving performance is parallel thinking, in which multiple reasoning traces are generated and the final prediction is made using aggregation schemes like majority voting or best-of-$N$ decoding. However, two key challenges persist. First, multi-sample decoding incurs substantial inference latency, especially for long-form outputs. Second, effective mechanisms for reliably assessing the correctness of individual reasoning traces are still limited. To address these challenges, we introduce One-Token Verification (OTV), a computational method that estimates reasoning correctness in a single forward pass during generation. OTV is activated by a learnable token and integrated into the LLM via low-rank adaptation to probe internal reasoning signals through the key-value cache, supporting token-level correctness estimation at any stage of generation without disrupting primary reasoning. Experiments on mathematical reasoning benchmarks demonstrate that OTV consistently surpasses existing verifiers. Additionally, OTV reduces token usage by up to $90\%$ through correctness-guided early termination, prioritizing shorter, more reliable solutions.
Problem

Research questions and friction points this paper is trying to address.

reasoning correctness
inference latency
verification
large language models
parallel thinking
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-Token Verification
reasoning correctness estimation
low-rank adaptation
early termination
key-value cache