Continuous Visual Autoregressive Generation via Score Maximization

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional autoregressive visual generation models rely on vector quantization to map images into discrete latent spaces, inevitably causing detail loss and reconstruction artifacts. This paper proposes the first quantization-free autoregressive framework operating directly in the continuous pixel space. Our core methodological innovation is the first application of strictly proper scoring rules (SPSRs) to visual autoregressive modeling, enabling a likelihood-free training objective based on energy scoring that unifies interpretations of paradigms including GIVT and diffusion losses. By circumventing the discretization bottleneck, our approach achieves superior detail fidelity and improved distribution matching on benchmarks such as ImageNet. Quantitative and qualitative evaluations demonstrate significant gains in generation quality, including sharper textures, more coherent structures, and better alignment with the target data distribution—without introducing additional architectural complexity or sampling overhead.

Technology Category

Application Category

📝 Abstract
Conventional wisdom suggests that autoregressive models are used to process discrete data. When applied to continuous modalities such as visual data, Visual AutoRegressive modeling (VAR) typically resorts to quantization-based approaches to cast the data into a discrete space, which can introduce significant information loss. To tackle this issue, we introduce a Continuous VAR framework that enables direct visual autoregressive generation without vector quantization. The underlying theoretical foundation is strictly proper scoring rules, which provide powerful statistical tools capable of evaluating how well a generative model approximates the true distribution. Within this framework, all we need is to select a strictly proper score and set it as the training objective to optimize. We primarily explore a class of training objectives based on the energy score, which is likelihood-free and thus overcomes the difficulty of making probabilistic predictions in the continuous space. Previous efforts on continuous autoregressive generation, such as GIVT and diffusion loss, can also be derived from our framework using other strictly proper scores. Source code: https://github.com/shaochenze/EAR.
Problem

Research questions and friction points this paper is trying to address.

Enables direct visual autoregressive generation without quantization
Uses strictly proper scoring rules for continuous data modeling
Overcomes probabilistic prediction difficulty in continuous space
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous VAR framework avoids vector quantization
Uses strictly proper scoring rules for training
Energy score enables likelihood-free continuous predictions
🔎 Similar Papers