NFIG: Autoregressive Image Generation with Next-Frequency Prediction

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autoregressive image generation faces challenges including difficulty in modeling long-range dependencies, high computational overhead, and spatial token ordering lacking semantic meaning. This paper proposes NFIG, the first frequency-guided autoregressive framework, which decomposes image synthesis into a multi-stage, spectrally hierarchical process: modeling low-frequency global structure first, then progressively injecting high-frequency details—aligning with the intrinsic spectral decay of natural images. NFIG comprises frequency-domain decomposition, band-wise tokenization and prediction, and progressive reconstruction, eliminating conventional raster-scan token sequences. On ImageNet-256, NFIG achieves a state-of-the-art FID of 2.81, outperforming VAR-d20 by 1.25× speedup in sampling latency while substantially reducing the number of sampling steps. The framework thus delivers both superior modeling fidelity and inference efficiency.

Technology Category

Application Category

📝 Abstract
Autoregressive models have achieved promising results in natural language processing. However, for image generation tasks, they encounter substantial challenges in effectively capturing long-range dependencies, managing computational costs, and most crucially, defining meaningful autoregressive sequences that reflect natural image hierarchies. To address these issues, we present extbf{N}ext- extbf{F}requency extbf{I}mage extbf{G}eneration ( extbf{NFIG}), a novel framework that decomposes the image generation process into multiple frequency-guided stages. Our approach first generates low-frequency components to establish global structure with fewer tokens, then progressively adds higher-frequency details, following the natural spectral hierarchy of images. This principled autoregressive sequence not only improves the quality of generated images by better capturing true causal relationships between image components, but also significantly reduces computational overhead during inference. Extensive experiments demonstrate that NFIG achieves state-of-the-art performance with fewer steps, offering a more efficient solution for image generation, with 1.25$ imes$ speedup compared to VAR-d20 while achieving better performance (FID: 2.81) on the ImageNet-256 benchmark. We hope that our insight of incorporating frequency-domain knowledge to guide autoregressive sequence design will shed light on future research. We will make our code publicly available upon acceptance of the paper.
Problem

Research questions and friction points this paper is trying to address.

Addresses challenges in autoregressive image generation models.
Improves image quality by capturing true causal relationships.
Reduces computational overhead with frequency-guided stages.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes image generation into frequency-guided stages
Generates low-frequency components first for global structure
Progressively adds higher-frequency details for image refinement
🔎 Similar Papers
No similar papers found.
Zhihao Huang
Zhihao Huang
NWPU
Computer Science
X
Xi Qiu
TeleAI
Y
Yukuo Ma
Beihang University, TeleAI
Yifu Zhou
Yifu Zhou
Northwestern Polytechnical University, TeleAI
C
Chi Zhang
TeleAI
X
Xuelong Li
TeleAI