Budget-aware Test-time Scaling via Discriminative Verification

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

To address the prohibitively high computational cost of test-time scaling for large language models (LLMs) on complex reasoning tasks, this paper proposes a hybrid scaling paradigm that integrates discriminative verification with self-consistency ensembling. Unlike costly generative verification—where multiple candidate solutions are generated and re-ranked—our approach employs a lightweight discriminative verifier to score, filter, and weight candidate outputs, enabling more efficient aggregation under fixed computational budgets. Evaluated on mathematical reasoning benchmarks including AIME, our method achieves up to a 15.3% absolute accuracy improvement over state-of-the-art generative verification baselines, while substantially reducing inference latency and resource consumption. The key contribution is the first integration of discriminative verification into the self-consistency framework, yielding Pareto-optimal improvements in both performance and efficiency. This advances practical test-time scaling by introducing a computationally frugal yet highly effective alternative to generation-heavy approaches.

Technology Category

Application Category

📝 Abstract

Test-time scaling is a powerful strategy for boosting the performance of large language models on complex reasoning tasks. While state-of-the-art approaches often employ generative verifiers to select the best solution from a pool of candidates, this method incurs prohibitive computational costs, limiting its practicality. In this work, we shift the focus to a more budget-aware paradigm: discriminative verification. We conduct a thorough empirical analysis and demonstrate that while discriminative verifiers may underperform in isolation, combining them with self-consistency in a hybrid approach creates a powerful and efficient test-time scaling mechanism. Notably, under a fixed compute budget, this hybrid approach surpasses state-of-the-art generative verification by a significant margin: achieving up to 15.3% higher accuracy on AIME2025. Our findings establish that for practical, real-world applications, budget-aware scaling with discriminative verifiers is not only a"free"upgrade over self-consistency, but also a more effective and efficient alternative to costly generative techniques. Code is available at https://github.com/wang-research-lab/verification.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs of test-time scaling methods

Improving reasoning performance under fixed compute budgets

Combining discriminative verification with self-consistency efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid discriminative verification with self-consistency

Budget-aware scaling reduces computational costs

Outperforms generative verification under fixed compute

🔎 Similar Papers

No similar papers found.