🤖 AI Summary
Autoregressive image generation struggles to balance fine-grained inter-pixel dependency modeling with efficient parallel sampling. This work proposes a multi-scale autoregressive approach based on a progressive checkerboard ordering, leveraging a quadtree structure to enable balanced sampling across scales and effective conditional modeling both within and across scales. The method employs a fixed yet flexible sampling order that harmonizes sequential and parallel generation within a multi-scale pyramid. Notably, the authors find that when the total number of sequential sampling steps is held constant, varying scale reduction factors yield comparable performance. On class-conditional ImageNet, the model achieves generation quality on par with state-of-the-art autoregressive models while requiring significantly fewer sampling steps.
📝 Abstract
A key challenge in autoregressive image generation is to efficiently sample independent locations in parallel, while still modeling mutual dependencies with serial conditioning. Some recent works have addressed this by conditioning between scales in a multiscale pyramid. Others have looked at parallelizing samples in a single image using regular partitions or randomized orders. In this work we examine a flexible, fixed ordering based on progressive checkerboards for multiscale autoregressive image generation. Our ordering draws samples in parallel from evenly spaced regions at each scale, maintaining full balance in all levels of a quadtree subdivision at each step. This enables effective conditioning both between and within scales. Intriguingly, we find evidence that in our balanced setting, a wide range of scale-up factors lead to similar results, so long as the total number of serial steps is constant. On class-conditional ImageNet, our method achieves competitive performance compared to recent state-of-the-art autoregressive systems with like model capacity, using fewer sampling steps.