Autoregressive Image Generation with Linear Complexity: A Spatial-Aware Decay Perspective

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Autoregressive image generation models predominantly rely on Transformers, suffering from high computational complexity (O(N²)) and substantial memory overhead. While linear attention reduces complexity to O(N), it neglects intrinsic 2D spatial structure, impairing long-range dependency modeling and degrading generation quality. To address this, we propose LASADGen—a linear-attention-based efficient autoregressive image generation framework. Its core innovation is a spatially aware decay mechanism: learnable decay factors are constructed from genuine 2D pixel coordinates to explicitly model pairwise 2D distance dependencies; these are integrated with flattened-sequence positional encodings to enable selective contextual attention. Evaluated on ImageNet, LASADGen achieves state-of-the-art generation fidelity under linear complexity, significantly outperforming existing linear-attention approaches and striking an optimal trade-off between inference speed and perceptual quality.

Technology Category

Application Category

📝 Abstract

Autoregressive (AR) models have garnered significant attention in image generation for their ability to effectively capture both local and global structures within visual data. However, prevalent AR models predominantly rely on the transformer architectures, which are beset by quadratic computational complexity concerning input sequence length and substantial memory overhead due to the necessity of maintaining key-value caches. Although linear attention mechanisms have successfully reduced this burden in language models, our initial experiments reveal that they significantly degrade image generation quality because of their inability to capture critical long-range dependencies in visual data. We propose Linear Attention with Spatial-Aware Decay (LASAD), a novel attention mechanism that explicitly preserves genuine 2D spatial relationships within the flattened image sequences by computing position-dependent decay factors based on true 2D spatial location rather than 1D sequence positions. Based on this mechanism, we present LASADGen, an autoregressive image generator that enables selective attention to relevant spatial contexts with linear complexity. Experiments on ImageNet show LASADGen achieves state-of-the-art image generation performance and computational efficiency, bridging the gap between linear attention's efficiency and spatial understanding needed for high-quality generation.

Problem

Research questions and friction points this paper is trying to address.

Reducing quadratic complexity in autoregressive image models

Preserving spatial relationships in linear attention mechanisms

Improving image generation quality with linear computational cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear Attention with Spatial-Aware Decay (LASAD)

Preserves 2D spatial relationships in sequences

Achieves linear complexity and high efficiency

🔎 Similar Papers

No similar papers found.