Parallel Test-Time Scaling with Multi-Sequence Verifiers

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing approaches in parallel testing, which suffer from inaccurate candidate selection and high inference latency. We propose the Multi-Sequence Verifier (MSV), the first method to jointly model multiple candidate solutions by explicitly capturing inter-solution interactions, thereby enhancing verifier calibration and enabling more accurate selection of the optimal solution. Integrated with a streaming verification architecture, MSV supports an efficient parallel early-stopping mechanism that reduces inference latency by approximately 50% while maintaining target accuracy, significantly outperforming baseline methods that score candidates sequentially.

Technology Category

Application Category

📝 Abstract
Parallel test-time scaling, which generates multiple candidate solutions for a single problem, is a powerful technique for improving large language model performance. However, it is hindered by two key bottlenecks: accurately selecting the correct solution from the candidate pool, and the high inference latency from generating many full solutions. We argue that both challenges are fundamentally linked to verifier calibration. A well-calibrated verifier not only improves answer selection, but also enables early-stopping strategies to reduce latency. However, existing verifiers are limited as they score each candidate in isolation, overlooking rich contextual information across the set of candidates. To address this, we introduce the Multi-Sequence Verifier (MSV), the first verifier designed to jointly process all candidate solutions and model their interactions. MSV achieves improved calibration, which directly enhances best-of-N selection performance. We further introduce a streaming MSV variant that empowers a novel early-stopping framework. Our novel framework fully leverages parallel decoding, which contrasts with the existing multi-sequence early exit works that decode sequences one by one and thus incur significant latency. In this novel setting, MSV can achieve the same target accuracy with around half the latency that would be required with its counterpart that scores each solution in isolation.
Problem

Research questions and friction points this paper is trying to address.

parallel test-time scaling
verifier calibration
inference latency
candidate selection
early stopping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Sequence Verifier
parallel test-time scaling
verifier calibration
early-stopping
parallel decoding