Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation

📅 2026-01-14

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the inefficiency of autoregressive image generation, which suffers from sequential dependencies and ambiguity in image tokens, leading to slow inference. While existing speculative decoding methods struggle to balance speed and generation quality, this paper introduces COOL-SD, the first approach to establish a theoretical foundation for relaxed speculative decoding. By analyzing total variation distance and perturbation behavior, the authors derive an optimal resampling distribution and incorporate an annealing mechanism to dynamically adjust the degree of relaxation during decoding. The proposed method achieves significant acceleration without compromising visual fidelity, consistently outperforming current state-of-the-art techniques across multiple benchmarks and offering a superior trade-off between inference speed and generation quality.

Technology Category

Application Category

📝 Abstract

Despite significant progress in autoregressive image generation, inference remains slow due to the sequential nature of AR models and the ambiguity of image tokens, even when using speculative decoding. Recent works attempt to address this with relaxed speculative decoding but lack theoretical grounding. In this paper, we establish the theoretical basis of relaxed SD and propose COOL-SD, an annealed relaxation of speculative decoding built on two key insights. The first analyzes the total variation (TV) distance between the target model and relaxed speculative decoding and yields an optimal resampling distribution that minimizes an upper bound of the distance. The second uses perturbation analysis to reveal an annealing behaviour in relaxed speculative decoding, motivating our annealed design. Together, these insights enable COOL-SD to generate images faster with comparable quality, or achieve better quality at similar latency. Experiments validate the effectiveness of COOL-SD, showing consistent improvements over prior methods in speed-quality trade-offs.

Problem

Research questions and friction points this paper is trying to address.

autoregressive image generation

speculative decoding

inference speed

relaxed decoding

image tokens

Innovation

Methods, ideas, or system contributions that make the work stand out.

speculative decoding

annealed relaxation

total variation distance