Future Validity is the Missing Statistic: From Impossibility to $Φ$-Estimation for Grammar-Faithful Speculative Decoding

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing grammar-constrained speculative decoding methods are limited to sampling from local projected distributions and fail to approximate the user-specified grammatical conditional distribution, resulting in significant bias. This work proposes a speculative decoding framework grounded in the Doob h-transform, which for the first time identifies the future validity function Φ as the critical correction statistic, thereby circumventing the limitations imposed by the GAD impossibility result. The paper establishes a theoretical link between Φ estimation and distribution fidelity. Efficient estimation of Φ is achieved on Dyck and finite JSON languages through dynamic programming, local masking corrections, and hierarchical Φ estimation algorithms: OneStep estimation reduces total variation distance by 14% on Dyck languages, dynamic programming achieves a 97% reduction, and finite-language correction drives JSON errors down to numerical precision levels—all while maintaining low inference overhead.

📝 Abstract

Grammar-constrained generation is often combined with local vocabulary masking and speculative decoding, but the resulting sampling law is not the grammar-conditional distribution users usually intend. We show that any speculative decoder with local mask access, Leviathan rejection, and rollback soundness samples from the locally projected distribution $μ^{\mathrm{proj}}$ rather than the grammar-conditional distribution $μ^\star$. This extends the GAD impossibility result to speculative decoding; on Dyck grammars with Qwen3-8B, the total-variation gap can reach 0.996. We identify the future-validity function $Φ_t(y)=\Pr_p[\mathrm{valid\ completion}\mid y]$ as the missing correction statistic. The target distribution is a Doob transform of the base model with $h=Φ$, while local masking corresponds to setting $h$ to one. With exact $Φ$, our oracle decoder FVO-Spec samples exactly from $μ^\star$; with approximate $Φ$, we bound the resulting total-variation error. Because exact future validity is hard for general context-free grammars, we evaluate estimator hierarchies on tractable Dyck and finite JSON languages. OneStep reduces Dyck TV by 14% with under 1% throughput overhead, exact dynamic programming reduces it by 97%, and finite-language correction closes JSON gaps to numerical precision. All fidelity claims are scoped to enumerable grammars and token tries.

Problem

Research questions and friction points this paper is trying to address.

speculative decoding

grammar-constrained generation

future validity

conditional distribution

total-variation gap

Innovation

Methods, ideas, or system contributions that make the work stand out.

future-validity

speculative decoding

grammar-constrained generation