🤖 AI Summary
VAR models suffer from prohibitive computational overhead during large-scale autoregressive generation, and existing acceleration methods rely on manually predefined step counts, ignoring intrinsic semantic importance variations across generation stages.
Method: We propose the first training-free, stage-aware acceleration framework. It quantitatively characterizes the semantic importance disparity—early stages being semantics-critical and later stages focusing on detail refinement—and accordingly introduces stage-adaptive dynamic pruning and low-rank approximation. The method is plug-and-play: it requires no fine-tuning or additional training, leveraging only semantic-agnostic analysis and stage-wise computational compression.
Contribution/Results: Our approach achieves up to 3.4× speedup while incurring only marginal degradation—0.01 drop in GenEval and 0.26 in DPG—significantly outperforming state-of-the-art baselines. It establishes a new paradigm for efficient, semantics-informed VAR inference.
📝 Abstract
Visual Autoregressive (VAR) modeling departs from the next-token prediction paradigm of traditional Autoregressive (AR) models through next-scale prediction, enabling high-quality image generation. However, the VAR paradigm suffers from sharply increased computational complexity and running time at large-scale steps. Although existing acceleration methods reduce runtime for large-scale steps, but rely on manual step selection and overlook the varying importance of different stages in the generation process. To address this challenge, we present StageVAR, a systematic study and stage-aware acceleration framework for VAR models. Our analysis shows that early steps are critical for preserving semantic and structural consistency and should remain intact, while later steps mainly refine details and can be pruned or approximated for acceleration. Building on these insights, StageVAR introduces a plug-and-play acceleration strategy that exploits semantic irrelevance and low-rank properties in late-stage computations, without requiring additional training. Our proposed StageVAR achieves up to 3.4x speedup with only a 0.01 drop on GenEval and a 0.26 decrease on DPG, consistently outperforming existing acceleration baselines. These results highlight stage-aware design as a powerful principle for efficient visual autoregressive image generation.