Diversity Has Always Been There in Your Visual Autoregressive Models

📅 2025-11-21

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Visual autoregressive (VAR) models suffer from diversity collapse—severely diminished output variability during generation. To address this without increasing training overhead, we propose DiverseVAR, a plug-and-play method that enhances diversity through input preprocessing and output postprocessing. By analyzing multi-scale feature map components, we identify early-layer channel activations as decisive for diversity. Leveraging this insight, DiverseVAR introduces an input suppression and output amplification strategy, integrated with a training-free feature-space decomposition and component modulation mechanism. Crucially, it requires no fine-tuning or retraining. Experiments across multiple benchmarks demonstrate that DiverseVAR significantly improves generative diversity—evidenced by stable or improved FID and CLIP-Score—while preserving image fidelity and maintaining inference efficiency. Thus, DiverseVAR establishes an efficient, general-purpose paradigm for diversity enhancement in VAR models, advancing their practical deployment.

Technology Category

Application Category

📝 Abstract

Visual Autoregressive (VAR) models have recently garnered significant attention for their innovative next-scale prediction paradigm, offering notable advantages in both inference efficiency and image quality compared to traditional multi-step autoregressive (AR) and diffusion models. However, despite their efficiency, VAR models often suffer from the diversity collapse i.e., a reduction in output variability, analogous to that observed in few-step distilled diffusion models. In this paper, we introduce DiverseVAR, a simple yet effective approach that restores the generative diversity of VAR models without requiring any additional training. Our analysis reveals the pivotal component of the feature map as a key factor governing diversity formation at early scales. By suppressing the pivotal component in the model input and amplifying it in the model output, DiverseVAR effectively unlocks the inherent generative potential of VAR models while preserving high-fidelity synthesis. Empirical results demonstrate that our approach substantially enhances generative diversity with only neglectable performance influences. Our code will be publicly released at https://github.com/wangtong627/DiverseVAR.

Problem

Research questions and friction points this paper is trying to address.

Addressing diversity collapse in Visual Autoregressive models

Restoring generative diversity without requiring retraining

Enhancing output variability while preserving image fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Suppresses pivotal component in model input

Amplifies pivotal component in model output

Enhances diversity without additional training

🔎 Similar Papers

No similar papers found.