MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autoregressive (AR) visual generation suffers from prohibitively slow inference due to sequential token-by-token decoding. Existing Speculative Jacobi Decoding (SJD) methods sample draft tokens independently, causing inter-iteration inconsistency and severely degrading acceptance rates. This paper proposes a training-free, lossless parallel decoding framework: it designs the draft generation process using maximal coupling, and enhances sampling consistency via information-theoretic optimization—preserving the original model’s architecture and loss characteristics while requiring only a single-line code modification to substantially improve acceptance rates. On image generation, it achieves a 4.2× speedup; on video generation, a 13.3× speedup—both without compromising generation quality. The core innovation lies in the first integration of coupled sampling into the SJD framework, uniquely balancing inference efficiency, generation fidelity, and deployment simplicity.

Technology Category

Application Category

📝 Abstract
While autoregressive (AR) modeling has recently emerged as a new paradigm in visual generation, its practical adoption is severely constrained by the slow inference speed of per-token generation, which often requires thousands of steps to produce a single sample. To address this challenge, we propose MC-SJD, a training-free, lossless parallel decoding framework designed to accelerate AR visual generation by extending the recently introduced Speculative Jacobi Decoding (SJD). Although SJD shows strong potential for accelerating AR generation, we demonstrate that token instability across iterations significantly reduces the acceptance rate, a limitation that primarily arises from the independent sampling process used during draft token generation. To overcome this, we introduce MC-SJD, an information-theoretic approach based on coupling, which substantially accelerates standard SJD by maximizing the probability of sampling identical draft tokens across consecutive iterations, all while preserving its lossless property. Remarkably, this method requires only a single-line modification to the existing algorithm, yet achieves substantial performance gains, delivering up to a ~4.2x acceleration in image generation and ~13.3x acceleration in video generation compared to standard AR decoding, without any degradation in output quality.
Problem

Research questions and friction points this paper is trying to address.

Accelerating slow autoregressive visual generation inference
Overcoming token instability in speculative Jacobi decoding
Maximizing draft token consistency across decoding iterations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximizes draft token sampling probability across iterations
Uses coupling-based approach to enhance speculative Jacobi decoding
Requires minimal algorithm modification for substantial speedup
🔎 Similar Papers
No similar papers found.
J
Junhyuk So
Department of Computer Science and Engineering, POSTECH, South Korea
H
Hyunho Kook
Department of Computer Science and Engineering, POSTECH, South Korea
C
Chaeyeon Jang
Department of Computer Science and Engineering, POSTECH, South Korea
Eunhyeok Park
Eunhyeok Park
POSTECH
neural network optimizationenergy efficient hardware design