Fast Autoregressive Models for Continuous Latent Generation

๐Ÿ“… 2025-04-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing autoregressive image generation models rely on diffusion-based heads operating in the continuous domain for iterative denoising, resulting in low inference efficiency. This paper proposes FAR (Fast Autoregressive), the first framework enabling efficient autoregressive image generation directly over continuous tokensโ€”without requiring discrete quantization. Its core innovation is a lightweight shortcut head that replaces the conventional diffusion head, eliminating multi-step sampling, while seamlessly extending causal Transformers to continuous space. Experiments demonstrate that FAR achieves a 2.3ร— speedup in inference time, while matching state-of-the-art methods in both FID and Inception Score. By preserving generation quality without sacrificing speed, FAR unifies high fidelity and computational efficiency in continuous-domain autoregressive modeling.

Technology Category

Application Category

๐Ÿ“ Abstract
Autoregressive models have demonstrated remarkable success in sequential data generation, particularly in NLP, but their extension to continuous-domain image generation presents significant challenges. Recent work, the masked autoregressive model (MAR), bypasses quantization by modeling per-token distributions in continuous spaces using a diffusion head but suffers from slow inference due to the high computational cost of the iterative denoising process. To address this, we propose the Fast AutoRegressive model (FAR), a novel framework that replaces MAR's diffusion head with a lightweight shortcut head, enabling efficient few-step sampling while preserving autoregressive principles. Additionally, FAR seamlessly integrates with causal Transformers, extending them from discrete to continuous token generation without requiring architectural modifications. Experiments demonstrate that FAR achieves $2.3 imes$ faster inference than MAR while maintaining competitive FID and IS scores. This work establishes the first efficient autoregressive paradigm for high-fidelity continuous-space image generation, bridging the critical gap between quality and scalability in visual autoregressive modeling.
Problem

Research questions and friction points this paper is trying to address.

Slow inference in continuous-domain autoregressive image generation
High computational cost in iterative denoising processes
Bridging quality and scalability in visual autoregressive modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Replaces diffusion head with lightweight shortcut head
Integrates with causal Transformers seamlessly
Enables efficient few-step continuous token generation
๐Ÿ”Ž Similar Papers
No similar papers found.