Spanning Tree Autoregressive Visual Generation

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual autoregressive (AR) image generation faces a fundamental trade-off: random token ordering enables bidirectional context modeling but violates spatial priors (e.g., centrality bias and locality), whereas fixed raster-scan ordering respects such priors yet hinders flexible, region-specific editing. This work proposes Spanning Tree Autoregressive (STAR), a novel AR framework that constructs a structured token sequence via breadth-first traversal of a uniformly sampled spanning tree over the image patch lattice. STAR integrates spatial inductive biases intrinsically while preserving compatibility with standard language-style AR architectures. It supports efficient suffix completion via rejection sampling and enables dynamic specification—during inference—of both initial seed regions and edit masks. Experiments demonstrate that STAR significantly outperforms random permutation baselines on image editing tasks, achieving superior balance among sampling efficiency, contextual expressivity, and sequence-order flexibility without compromising generation quality.

Technology Category

Application Category

📝 Abstract
We present Spanning Tree Autoregressive (STAR) modeling, which can incorporate prior knowledge of images, such as center bias and locality, to maintain sampling performance while also providing sufficiently flexible sequence orders to accommodate image editing at inference. Approaches that expose randomly permuted sequence orders to conventional autoregressive (AR) models in visual generation for bidirectional context either suffer from a decline in performance or compromise the flexibility in sequence order choice at inference. Instead, STAR utilizes traversal orders of uniform spanning trees sampled in a lattice defined by the positions of image patches. Traversal orders are obtained through breadth-first search, allowing us to efficiently construct a spanning tree whose traversal order ensures that the connected partial observation of the image appears as a prefix in the sequence through rejection sampling. Through the tailored yet structured randomized strategy compared to random permutation, STAR preserves the capability of postfix completion while maintaining sampling performance without any significant changes to the model architecture widely adopted in the language AR modeling.
Problem

Research questions and friction points this paper is trying to address.

Incorporating image priors like center bias and locality into autoregressive visual generation
Maintaining sampling performance while enabling flexible sequence orders for image editing
Overcoming performance decline in bidirectional context modeling with structured randomized strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spanning Tree Autoregressive modeling incorporates image prior knowledge
Utilizes uniform spanning tree traversal orders for image patches
Maintains sampling performance while enabling flexible sequence editing
🔎 Similar Papers
No similar papers found.