Grouped Speculative Decoding for Autoregressive Image Generation

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autoregressive image generation suffers from slow inference due to sequential token decoding, hindering practical deployment. To address this, we propose Grouped Speculative Decoding—a novel inference acceleration framework that exploits redundancy and diversity among image tokens in the visual-semantic space. Instead of conventional per-token speculative matching, our method dynamically clusters candidate tokens based on visual similarity and jointly validates entire groups for acceptance. This group-level verification eliminates the need for additional training while significantly reducing rejection rates. Extensive experiments across multiple autoregressive image models demonstrate an average 3.7× speedup in inference latency, with no degradation in image fidelity—as quantified by consistent FID and LPIPS scores. Our core contribution lies in generalizing speculative decoding from single-token matching to semantic-coherent token-group verification, thereby enhancing both model practicality and scalability.

Technology Category

Application Category

📝 Abstract
Recently, autoregressive (AR) image models have demonstrated remarkable generative capabilities, positioning themselves as a compelling alternative to diffusion models. However, their sequential nature leads to long inference times, limiting their practical scalability. In this work, we introduce Grouped Speculative Decoding (GSD), a novel, training-free acceleration method for AR image models. While recent studies have explored Speculative Decoding (SD) as a means to speed up AR image generation, existing approaches either provide only modest acceleration or require additional training. Our in-depth analysis reveals a fundamental difference between language and image tokens: image tokens exhibit inherent redundancy and diversity, meaning multiple tokens can convey valid semantics. However, traditional SD methods are designed to accept only a single most-likely token, which fails to leverage this difference, leading to excessive false-negative rejections. To address this, we propose a new SD strategy that evaluates clusters of visually valid tokens rather than relying on a single target token. Additionally, we observe that static clustering based on embedding distance is ineffective, which motivates our dynamic GSD approach. Extensive experiments show that GSD accelerates AR image models by an average of 3.7x while preserving image quality-all without requiring any additional training. The source code is available at https://github.com/junhyukso/GSD
Problem

Research questions and friction points this paper is trying to address.

Accelerates autoregressive image models without training
Reduces false negatives in speculative decoding for images
Dynamically groups visually valid tokens for faster generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Grouped Speculative Decoding for AR image models
Dynamic clustering of visually valid tokens
Training-free acceleration with 3.7x speedup
🔎 Similar Papers
No similar papers found.
J
Junhyuk So
Department of Computer Science and Engineering, POSTECH
Juncheol Shin
Juncheol Shin
Ph.D Student at Postech
Deep LearningNeural Network QuantizationNeural Architecture SearchQuantized Architecture
H
Hyunho Kook
Department of Computer Science and Engineering, POSTECH
Eunhyeok Park
Eunhyeok Park
POSTECH
neural network optimizationenergy efficient hardware design