Adaptive Generate-Rank-Verify: Inference-Time Search with Costly Verification

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the challenge of efficiently identifying the first correct output during inference while minimizing verification cost, particularly in settings that combine cheap reward signals with expensive verifiers—such as mathematical answer checking or hidden test cases in code generation. The problem is formalized as cost-sensitive first-positive search, and the authors propose ADAP, a distribution-agnostic yet near-optimal adaptive strategy. ADAP dynamically adjusts sampling and verification scales through dynamic programming and shell-wise incremental validation, requiring no prior knowledge of the underlying distribution. Under a monotonicity assumption, theoretical analysis shows its expected cost nearly matches that of the distribution-aware optimum, leveraging a lower bound based on the central star number. Empirical results demonstrate that ADAP substantially outperforms both fixed and difficulty-adaptive baselines in mathematical reasoning and competitive programming tasks, achieving significant reductions in verification cost.

📝 Abstract

Many inference-time language-model pipelines combine a cheap reward signal with an expensive verifier, such as exact answer checking in mathematical reasoning or hidden-test execution in code generation. We formalize this setting using a learning-theoretic lens as generative active search: a cost-sensitive first-positive search problem in which a policy adaptively samples candidates from an unknown distribution, observes cheap scores, and pays for verifier labels until it finds a positive example. For a fixed prompt, the generator and reward model induce two unknown objects: a distribution over reward scores and a score-conditioned success function. When these quantities are known, we characterize the distribution-aware optimal policy using a dynamic programming approach. In the realistic and practical setting where both the score distribution and success function are unknown, we propose ADAP, a shellwise adaptive generate-rank-verify algorithm that progressively increases the number of sampled responses and top-ranked verifications. Under the monotonicity assumption that higher reward scores are no less likely to pass verification, we show that ADAP achieves expected cost within a constant factor of the distribution-aware optimum. We complement this result with learning-theoretic lower bounds, based on a centered star number, showing that structural assumptions on the score--label relationship are necessary. Experiments on mathematical reasoning and competitive programming validate the predicted advantage over both fixed non-adaptive policies and difficulty-adaptive baselines.

Problem

Research questions and friction points this paper is trying to address.

inference-time search

costly verification

generative active search

adaptive verification

language model

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive inference

cost-sensitive search

generate-rank-verify