Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation

📅 2026-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of autoregressive image generation, which suffers from sequential dependencies and ambiguity in image tokens, leading to slow inference. While existing speculative decoding methods struggle to balance speed and generation quality, this paper introduces COOL-SD, the first approach to establish a theoretical foundation for relaxed speculative decoding. By analyzing total variation distance and perturbation behavior, the authors derive an optimal resampling distribution and incorporate an annealing mechanism to dynamically adjust the degree of relaxation during decoding. The proposed method achieves significant acceleration without compromising visual fidelity, consistently outperforming current state-of-the-art techniques across multiple benchmarks and offering a superior trade-off between inference speed and generation quality.

Technology Category

Application Category

📝 Abstract
Despite significant progress in autoregressive image generation, inference remains slow due to the sequential nature of AR models and the ambiguity of image tokens, even when using speculative decoding. Recent works attempt to address this with relaxed speculative decoding but lack theoretical grounding. In this paper, we establish the theoretical basis of relaxed SD and propose COOL-SD, an annealed relaxation of speculative decoding built on two key insights. The first analyzes the total variation (TV) distance between the target model and relaxed speculative decoding and yields an optimal resampling distribution that minimizes an upper bound of the distance. The second uses perturbation analysis to reveal an annealing behaviour in relaxed speculative decoding, motivating our annealed design. Together, these insights enable COOL-SD to generate images faster with comparable quality, or achieve better quality at similar latency. Experiments validate the effectiveness of COOL-SD, showing consistent improvements over prior methods in speed-quality trade-offs.
Problem

Research questions and friction points this paper is trying to address.

autoregressive image generation
speculative decoding
inference speed
relaxed decoding
image tokens
Innovation

Methods, ideas, or system contributions that make the work stand out.

speculative decoding
annealed relaxation
total variation distance
autoregressive image generation
perturbation analysis
🔎 Similar Papers
No similar papers found.
X
Xingyao Li
National University of Singapore
Fengzhuo Zhang
Fengzhuo Zhang
NUS
Cunxiao Du
Cunxiao Du
Research Scientist at Sea AI Lab
NLPLLM Inference
H
Hui Ji
National University of Singapore