Adaptive Test-Time Compute Allocation via Learned Heuristics over Categorical Structure

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the key challenge of efficiently allocating limited verification resources in cost-constrained reasoning scenarios, where redundant or low-potential intermediate hypotheses often lead to wasted computation. To this end, we propose a state-level selective verification framework that dynamically focuses on the most informative intermediate states through three core mechanisms: feasibility gating via structured action interfaces, pre-verification ranking based on learned state distances and residual scores, and an adaptive verification invocation strategy guided by local uncertainty. Departing from conventional uniform verification or solution-level sampling paradigms, our approach is the first to enable dynamic resource allocation grounded in the information value of intermediate reasoning states. On the MATH benchmark, it achieves higher accuracy than best-of-N, majority voting, and beam search while using only 56% of the verification calls, substantially improving both verification efficiency and overall reasoning performance.

Technology Category

Application Category

📝 Abstract
Test-time computation has become a primary driver of progress in large language model (LLM) reasoning, but it is increasingly bottlenecked by expensive verification. In many reasoning systems, a large fraction of verifier calls are spent on redundant or unpromising intermediate hypotheses. We study reasoning under a \emph{verification-cost-limited} setting and ask how verification effort should be allocated across intermediate states. We propose a state-level selective verification framework that combines (i) deterministic feasibility gating over a structured move interface, (ii) pre-verification ranking using a hybrid of learned state-distance and residual scoring, and (iii) adaptive allocation of verifier calls based on local uncertainty. Unlike solution-level best-of-$N$ or uniform intermediate verification, our method distributes verification where it is most informative. On the \textsc{MATH} benchmark, our approach achieves higher accuracy than best-of-$N$, majority voting, and beam search while using 44\% fewer verifier calls.
Problem

Research questions and friction points this paper is trying to address.

test-time computation
verification cost
reasoning
large language models
resource allocation
Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time compute allocation
selective verification
learned heuristics
verification-cost-limited reasoning
adaptive resource allocation
🔎 Similar Papers
No similar papers found.