Guiding Visual Autoregressive Models through Spectrum Weakening

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Visual autoregressive models face a fundamental trade-off between unconditional generation quality and fidelity to conditional prompts. Method: This paper proposes a spectrum-attenuation guidance framework that requires no retraining, architectural modification, or prompt-specific design. Its core innovation is the first integration of invertible spectral transforms into autoregressive guidance, featuring a channel-wise spectral selection strategy and two spectral renormalization techniques for controllable information attenuation. It further establishes a classifier-free guidance paradigm compatible with both discrete and continuous autoregressive models. Results: Extensive experiments under diverse text and class conditioning demonstrate that the method significantly improves unconditional sample quality while maintaining strong prompt alignment, thereby enhancing both generative flexibility and training stability.

Technology Category

Application Category

📝 Abstract

Classifier-free guidance (CFG) has become a widely adopted and practical approach for enhancing generation quality and improving condition alignment. Recent studies have explored guidance mechanisms for unconditional generation, yet these approaches remain fundamentally tied to assumptions specific to diffusion models. In this work, we propose a spectrum-weakening framework for visual autoregressive (AR) models. This method works without the need for re-training, specific conditions, or any architectural modifications. It achieves this by constructing a controllable weak model in the spectral domain. We theoretically show that invertible spectral transformations preserve information, while selectively retaining only a subset of spectrum introduces controlled information reduction. Based on this insight, we perform spectrum selection along the channel dimension of internal representations, which avoids the structural constraints imposed by diffusion models. We further introduce two spectrum renormalization strategies that ensures numerical stability during the weakening process. Extensive experiments were conducted on both discrete and continuous AR models, with text or class conditioning. The results demonstrate that our method enables high-quality unconditional generation while maintaining strong prompt alignment for conditional generation.

Problem

Research questions and friction points this paper is trying to address.

Enhances visual autoregressive models without retraining

Enables unconditional generation via spectrum-weakening framework

Maintains prompt alignment for conditional generation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectrum weakening framework for autoregressive models

Controllable weak model in spectral domain

Spectrum selection and renormalization strategies

🔎 Similar Papers

DepthART: Monocular Depth Estimation as Autoregressive Refinement Task