Note on Martingale Theory and Applications

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the inconsistency between training and inference in existing speculative decoding methods, where training optimizes only a single greedy path while inference requires verifying multiple sampled paths. To bridge this gap, we introduce variational inference into speculative decoding for the first time, reformulating draft model training as posterior inference over latent proposal paths by maximizing the marginal probability of acceptance under the target model. We propose a path-level utility function, an EM-based optimization framework, and two novel mechanisms: Adaptive Rejection Weighting (ARW) and Confidence-Aware Regularization (CAR). Experiments demonstrate that our approach achieves up to 9.6% higher speedup than EAGLE-3 and a 7.9% improvement in acceptance rate over ViSpec across various large language and multimodal models, significantly enhancing inference efficiency.

Technology Category

Application Category

📝 Abstract

This note investigates core properties of martingales, emphasizing the measure-theoretic formulation of conditional expectation, the martingale transform, and the upcrossing lemma. These results lead to the Martingale Convergence Theorem, which we then apply to study the extinction behavior in Galton--Watson branching processes.

Problem

Research questions and friction points this paper is trying to address.

speculative decoding

training-decoding discrepancy

draft paths

sequence acceptance

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Variational Speculative Decoding

Sequence Acceptance

Latent Proposal