Traversal Verification for Speculative Tree Decoding

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing speculative decoding relies on layer-wise, token-level verification, resulting in short acceptance lengths and severe waste of candidate paths: (i) single-token probabilities fail to accurately reflect sequence-level distributions; and (ii) top-down verification discards entire subtrees upon rejection of any parent node. This work proposes a novel leaf-to-root, path-level verification paradigm that jointly models the probability of an entire candidate path and enables sequence-level parallel verification via tree-structured computation. We provide theoretical proof that the method exactly reproduces the target model’s distribution—eliminating approximation error inherent in conventional verification frameworks. Experiments across multiple large language models and tasks demonstrate substantial improvements in average acceptance length and throughput, achieving both inference acceleration and high generation quality without compromising distributional fidelity.

Technology Category

Application Category

📝 Abstract
Speculative decoding is a promising approach for accelerating large language models. The primary idea is to use a lightweight draft model to speculate the output of the target model for multiple subsequent timesteps, and then verify them in parallel to determine whether the drafted tokens should be accepted or rejected. To enhance acceptance rates, existing frameworks typically construct token trees containing multiple candidates in each timestep. However, their reliance on token-level verification mechanisms introduces two critical limitations: First, the probability distribution of a sequence differs from that of individual tokens, leading to suboptimal acceptance length. Second, current verification schemes begin from the root node and proceed layer by layer in a top-down manner. Once a parent node is rejected, all its child nodes should be discarded, resulting in inefficient utilization of speculative candidates. This paper introduces Traversal Verification, a novel speculative decoding algorithm that fundamentally rethinks the verification paradigm through leaf-to-root traversal. Our approach considers the acceptance of the entire token sequence from the current node to the root, and preserves potentially valid subsequences that would be prematurely discarded by existing methods. We theoretically prove that the probability distribution obtained through Traversal Verification is identical to that of the target model, guaranteeing lossless inference while achieving substantial acceleration gains. Experimental results across different large language models and multiple tasks show that our method consistently improves acceptance length and throughput over existing methods
Problem

Research questions and friction points this paper is trying to address.

Improves speculative decoding acceptance rates via leaf-to-root traversal
Addresses suboptimal token sequence acceptance in existing frameworks
Ensures lossless inference while accelerating large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leaf-to-root traversal verification for speculative decoding
Considers entire token sequence acceptance from node to root
Preserves valid subsequences discarded by existing methods
Yepeng Weng
Yepeng Weng
Researcher, Lenovo Research
Large Language ModelsComputer Vision
Q
Qiao Hu
National Center for Mathematics and Interdisciplinary Sciences (NCMIS), AMSS, CAS
X
Xujie Chen
Lenovo Advanced AI Technology Center, Lenovo
L
Li Liu
Lenovo Advanced AI Technology Center, Lenovo
D
Dianwen Mei
Lenovo Advanced AI Technology Center, Lenovo
H
Huishi Qiu
Lenovo Advanced AI Technology Center, Lenovo
Jiang Tian
Jiang Tian
Principal Researcher, AI Lab, Lenovo Research
medical imaging processingdeep learningcomputer visioncomputer graphicsrobotics
Z
Zhongchao Shi
Lenovo Advanced AI Technology Center, Lenovo