🤖 AI Summary
This work addresses the high computational cost of visual autoregressive models arising from processing massive token sequences. Existing acceleration methods are limited by heuristic phase partitioning, non-adaptive scheduling, and coarse-grained token pruning. To overcome these limitations, we propose NOVA, a novel framework that introduces, for the first time, the inflection point of scale-wise entropy growth as a dynamic evolution indicator, enabling adaptive, fine-grained token pruning without any additional training. NOVA employs a dual-coupling mechanism across scales and layers to dynamically adjust pruning ratios per layer, combined with low-entropy token removal and cross-scale residual cache reuse. Experiments demonstrate that NOVA achieves highly efficient, training-free acceleration of visual autoregressive generation while preserving output quality.
📝 Abstract
Visual AutoRegressive modeling (VAR) suffers from substantial computational cost due to the massive token count involved. Failing to account for the continuous evolution of modeling dynamics, existing VAR token reduction methods face three key limitations: heuristic stage partition, non-adaptive schedules, and limited acceleration scope, thereby leaving significant acceleration potential untapped. Since entropy variation intrinsically reflects the transition of predictive uncertainty, it offers a principled measure to capture modeling dynamics evolution. Therefore, we propose NOVA, a training-free token reduction acceleration framework for VAR models via entropy analysis. NOVA adaptively determines the acceleration activation scale during inference by online identifying the inflection point of scale entropy growth. Through scale-linkage and layer-linkage ratio adjustment, NOVA dynamically computes distinct token reduction ratios for each scale and layer, pruning low-entropy tokens while reusing the cache derived from the residuals at the prior scale to accelerate inference and maintain generation quality. Extensive experiments and analyses validate NOVA as a simple yet effective training-free acceleration framework.