Note on Martingale Theory and Applications

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inconsistency between training and inference in existing speculative decoding methods, where training optimizes only a single greedy path while inference requires verifying multiple sampled paths. To bridge this gap, we introduce variational inference into speculative decoding for the first time, reformulating draft model training as posterior inference over latent proposal paths by maximizing the marginal probability of acceptance under the target model. We propose a path-level utility function, an EM-based optimization framework, and two novel mechanisms: Adaptive Rejection Weighting (ARW) and Confidence-Aware Regularization (CAR). Experiments demonstrate that our approach achieves up to 9.6% higher speedup than EAGLE-3 and a 7.9% improvement in acceptance rate over ViSpec across various large language and multimodal models, significantly enhancing inference efficiency.

Technology Category

Application Category

📝 Abstract
This note investigates core properties of martingales, emphasizing the measure-theoretic formulation of conditional expectation, the martingale transform, and the upcrossing lemma. These results lead to the Martingale Convergence Theorem, which we then apply to study the extinction behavior in Galton--Watson branching processes.
Problem

Research questions and friction points this paper is trying to address.

speculative decoding
training-decoding discrepancy
draft paths
sequence acceptance
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Variational Speculative Decoding
Sequence Acceptance
Latent Proposal
Expectation-Maximization
Adaptive Rejection Weighting
🔎 Similar Papers
No similar papers found.