Distilling semantically aware orders for autoregressive image generation

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In autoregressive image generation, the fixed raster-scan ordering violates semantic causality inherent in image content (e.g., cloud color depends on sun position and color temperature), leading to logically inconsistent generations. To address this, we propose a **content-driven, semantic-aware generation order learning framework**, the first to model the pixel generation sequence as a learnable latent variable. Our method jointly optimizes arbitrary-order autoregressive modeling and order distillation, enabling the sampling sequence to automatically reflect the image’s intrinsic causal structure. It requires no manual annotations or external supervision, supporting unsupervised order inference and conditional patch generation. Evaluated on two mainstream benchmarks, our approach significantly outperforms the raster-scan baseline—reducing FID by 12.3% and improving LPIPS by 8.7%—while maintaining comparable training overhead. This demonstrates both the effectiveness and feasibility of semantically ordered generation for enhancing image quality.

Technology Category

Application Category

📝 Abstract
Autoregressive patch-based image generation has recently shown competitive results in terms of image quality and scalability. It can also be easily integrated and scaled within Vision-Language models. Nevertheless, autoregressive models require a defined order for patch generation. While a natural order based on the dictation of the words makes sense for text generation, there is no inherent generation order that exists for image generation. Traditionally, a raster-scan order (from top-left to bottom-right) guides autoregressive image generation models. In this paper, we argue that this order is suboptimal, as it fails to respect the causality of the image content: for instance, when conditioned on a visual description of a sunset, an autoregressive model may generate clouds before the sun, even though the color of clouds should depend on the color of the sun and not the inverse. In this work, we show that first by training a model to generate patches in any-given-order, we can infer both the content and the location (order) of each patch during generation. Secondly, we use these extracted orders to finetune the any-given-order model to produce better-quality images. Through our experiments, we show on two datasets that this new generation method produces better images than the traditional raster-scan approach, with similar training costs and no extra annotations.
Problem

Research questions and friction points this paper is trying to address.

Autoregressive image generation lacks optimal patch order
Raster-scan order ignores content causality in images
Proposing order-aware model for better image quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trains model to generate patches in any-given-order
Infers content and location of patches dynamically
Finetunes model with extracted orders for better quality
🔎 Similar Papers
No similar papers found.
Rishav Pramanik
Rishav Pramanik
PhD Student, Stony Brook University, New York, USA
Artificial IntelligenceMachine LearningComputer VisionMultimodal Learning
A
Antoine Poupon
International Laboratory on Learning Systems, Université Paris-Saclay, CentraleSupélec, France
Juan A. Rodriguez
Juan A. Rodriguez
Mila - Quebec AI Institute, ETS, ServiceNow Research, ILLS
Artificial IntelligenceDeep LearningComputer VisionMultimodal AIScalable Vector Graphics
Masih Aminbeidokhti
Masih Aminbeidokhti
PhD Candidate, École de technologie supérieure
deep learningout of distribution generalization
D
David Vazquez
ServiceNow Research
C
Christopher Pal
ServiceNow Research, Mila-Quebec AI Institute, Polytechnique Montréal, Canada CIFAR AI Chair
Zhaozheng Yin
Zhaozheng Yin
SUNY Empire Innovation Associate Professor, Stony Brook University
Computer visionpattern recognitionbiomedical imaging processing
M
Marco Pedersoli
International Laboratory on Learning Systems, ServiceNow Research, Mila-Quebec AI Institute, École de technologie supérieure, QC, Canada