🤖 AI Summary
This work addresses the pressing need for efficient compression of autoregressive video diffusion (ARVD) models, which suffer from high inference costs. Existing quantization methods are hindered by two key challenges: imbalanced inter-frame sensitivity and heterogeneous outlier distributions across model weights. The study is the first to reveal that frame sensitivity in ARVD quantization decays exponentially and that outlier patterns vary significantly across layers. To tackle these issues, the authors propose Q-ARVD, a novel framework featuring a generation-quality-aware frame-weighted quantization objective to better allocate importance across frames, an outlier-aware adaptive dual-scale quantization strategy, and an automatic outlier channel isolation mechanism. Experiments demonstrate that Q-ARVD substantially outperforms existing quantization approaches across multiple ARVD models, achieving significant computational savings while preserving high-quality video generation capabilities.
📝 Abstract
Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making model quantization a natural direction for improving efficiency. However, quantization for ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existing quantization schemes developed for standard diffusion transformers to ARVDs leads to suboptimal performance, revealing quantization behaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalanced frame-wise quantization sensitivity. Error accumulation during autoregressive generation can induce severely skewed quantization sensitivity across frames, following an exponential-like decay pattern. (C2) Prominent and heterogeneous outlier patterns in weights. Weight distributions exhibit pronounced outlier channels, whose patterns vary substantially across layer types and block depths. To address these issues, we propose Q-ARVD, a novel framework for accurate ARVD quantization. (S1) To tackle the highly unbalanced frame-wise sensitivity, Q-ARVD incorporates a final-quality aware frame-weighting mechanism into the quantization objective. (S2) To prevent heterogeneous outliers from degrading performance, Q-ARVD introduces an outlier-aware adaptive dual-scale quantization, which automatically detects the presence and quantity of outlier channels for an arbitrary layer, and isolates them to protect normal channels. Extensive experiments demonstrate the superiority of Q-ARVD.