Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Traditional autoregressive image generation suffers from high latency and memory overhead due to sequential, token-by-token prediction. To address this, we propose an efficient parallel decoding framework. Our method introduces three key innovations: (1) a flexible, configurable parallel autoregressive architecture that supports arbitrary parallelism and locality-aware generation orders; (2) learnable positional query tokens that explicitly model cross-block dependencies, ensuring contextual visibility among concurrently generated tokens; and (3) a low-dependency grouped scheduling strategy that maximizes parallel efficiency and output consistency without compromising generation quality. Evaluated on ImageNet conditional image generation, our approach reduces the number of generation steps from 256 to 20 for 256×256 images and from 1024 to 48 for 512×512 images. End-to-end latency is reduced by at least 3.4× compared to state-of-the-art parallel methods.

Technology Category

Application Category

📝 Abstract

We present Locality-aware Parallel Decoding (LPD) to accelerate autoregressive image generation. Traditional autoregressive image generation relies on next-patch prediction, a memory-bound process that leads to high latency. Existing works have tried to parallelize next-patch prediction by shifting to multi-patch prediction to accelerate the process, but only achieved limited parallelization. To achieve high parallelization while maintaining generation quality, we introduce two key techniques: (1) Flexible Parallelized Autoregressive Modeling, a novel architecture that enables arbitrary generation ordering and degrees of parallelization. It uses learnable position query tokens to guide generation at target positions while ensuring mutual visibility among concurrently generated tokens for consistent parallel decoding. (2) Locality-aware Generation Ordering, a novel schedule that forms groups to minimize intra-group dependencies and maximize contextual support, enhancing generation quality. With these designs, we reduce the generation steps from 256 to 20 (256$ imes$256 res.) and 1024 to 48 (512$ imes$512 res.) without compromising quality on the ImageNet class-conditional generation, and achieving at least 3.4$ imes$ lower latency than previous parallelized autoregressive models.

Problem

Research questions and friction points this paper is trying to address.

Accelerate autoregressive image generation with high parallelization

Reduce latency while maintaining image generation quality

Minimize intra-group dependencies for consistent parallel decoding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Flexible Parallelized Autoregressive Modeling for arbitrary generation

Locality-aware Generation Ordering minimizes dependencies

Learnable position query tokens guide generation

🔎 Similar Papers

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding