Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking

๐Ÿ“… 2025-10-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Minor inaccuracies in process verifiers are readily amplified during language model generation, leading to catastrophic failures; moreover, training high-quality verifiers is prohibitively expensive. Method: We propose the test-time sampling algorithm VGB, the first to introduce the Sinclairโ€“Jerrum random walk from theoretical computer science into language generation. VGB constructs a joint-probability-guided dynamic backtracking mechanism over the autoregressive generation tree, enabling robust responses to erroneous verifier signals. Contribution: We establish a falsifiable robustness-theoretic framework that formally uncovers the intrinsic connection between process verification and approximate sampling. Empirically, VGB significantly mitigates performance degradation induced by verifier errors across both synthetic and real-world tasks, consistently outperforming baselines on multiple evaluation metrics.

Technology Category

Application Category

๐Ÿ“ Abstract
Test-time algorithms that combine the generative power of language models with process verifiers that assess the quality of partial generations offer a promising lever for eliciting new reasoning capabilities, but the algorithmic design space and computational scaling properties of such approaches are still opaque, and their benefits are far from apparent when one accounts for the cost of learning a high-quality verifier. Our starting point is the observation that seemingly benign errors in a learned verifier can lead to catastrophic failures for standard decoding techniques due to error amplification during the course of generation. We then ask: can this be improved with more sophisticated decoding strategies? We introduce a new process-guided test-time sampling algorithm, VGB, which uses theoretically grounded backtracking to achieve provably better robustness to verifier errors. VGB interprets autoregressive generation as a random walk on a tree of partial generations, with transition probabilities guided by the process verifier and base model; crucially, backtracking occurs probabilistically. This process generalizes the seminal Sinclair-Jerrum random walk (Sinclair & Jerrum, 1989) from the literature on approximate counting and sampling in theoretical computer science, and a conceptual contribution of our work is to highlight parallels with this literature. Empirically, we demonstrate on both synthetic and real language modeling tasks that VGB outperforms baselines on a variety of metrics.
Problem

Research questions and friction points this paper is trying to address.

Addresses catastrophic failures from imperfect process verifiers in language models
Improves robustness to verifier errors with backtracking sampling algorithms
Enhances test-time decoding strategies for better reasoning capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

VGB algorithm uses backtracking for robustness
Probabilistic backtracking guided by verifier and model
Generalizes Sinclair-Jerrum random walk theory
๐Ÿ”Ž Similar Papers
No similar papers found.