Goose: Anisotropic Speculation Trees for Training-Free Speculative Decoding

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing training-free speculative decoding methods fail to account for the varying quality of candidate tokens and are constrained by balanced tree structures, limiting their ability to efficiently utilize verification budgets. This work proposes the Adaptive Spine Tree, which—without requiring any additional training—dynamically constructs an anisotropic speculation tree based on n-gram context matching and statistical predictions derived from historical forward passes. High-acceptance context-matched tokens are organized into deep chains, while low-acceptance statistically predicted tokens form wide branches, achieving an optimal trade-off between depth and breadth. Evaluated across five large language models (7B–33B) and five benchmarks, the method achieves lossless speedups of 1.9× to 4.3× and outperforms balanced-tree baselines by 12% to 33% under identical verification budgets.

Technology Category

Application Category

📝 Abstract

Speculative decoding accelerates large language model inference by drafting multiple candidate tokens and verifying them in a single forward pass. Candidates are organized as a tree: deeper trees accept more tokens per step, but adding depth requires sacrificing breadth (fallback options) under a fixed verification budget. Existing training-free methods draft from a single token source and shape their trees without distinguishing candidate quality across origins. We observe that two common training-free token sources - n-gram matches copied from the input context, and statistical predictions from prior forward passes - differ dramatically in acceptance rate (~6x median gap, range 2-18x across five models and five benchmarks). We prove that when such a quality gap exists, the optimal tree is anisotropic (asymmetric): reliable tokens should form a deep chain while unreliable tokens spread as wide branches, breaking through the depth limit of balanced trees. We realize this structure in GOOSE, a training-free framework that builds an adaptive spine tree - a deep chain of high-acceptance context-matched tokens with wide branches of low-acceptance alternatives at each node. We prove that the number of tokens accepted per step is at least as large as that of either source used alone. On five LLMs (7B-33B) and five benchmarks, GOOSE achieves 1.9-4.3x lossless speedup, outperforming balanced-tree baselines by 12-33% under the same budget.

Problem

Research questions and friction points this paper is trying to address.

speculative decoding

anisotropic trees

training-free

candidate quality

token acceptance rate

Innovation

Methods, ideas, or system contributions that make the work stand out.

anisotropic speculation trees

training-free speculative decoding

adaptive spine tree