🤖 AI Summary
Video diffusion Transformers (Video DiTs) face significant deployment challenges due to prohibitive computational and memory overheads; existing post-training quantization (PTQ) methods require large calibration datasets and suffer substantial accuracy degradation. To address this, we propose the first training-data-free PTQ framework for Video DiTs, featuring three key innovations: (1) Progressive Bounded Quantization, which gradually tightens quantization bounds to preserve representational fidelity; (2) Auto-scaling Rotated Quantization, which applies orthogonal rotation and dynamic scale adjustment to mitigate quantization error accumulation; and (3) δ-Guided Bit Switching, which adaptively assigns bit-widths (e.g., 4-bit weights and activations) based on per-layer sensitivity estimates. Our framework achieves lossless video generation quality under W4A4 quantization—demonstrated on HunyuanVideo—with a 2× inference speedup, thereby overcoming a critical bottleneck in efficient Video DiT deployment.
📝 Abstract
Diffusion Transformers (DiTs) have emerged as the state-of-the-art architecture for video generation, yet their computational and memory demands hinder practical deployment. While post-training quantization (PTQ) presents a promising approach to accelerate Video DiT models, existing methods suffer from two critical limitations: (1) dependence on lengthy, computation-heavy calibration procedures, and (2) considerable performance deterioration after quantization. To address these challenges, we propose DVD-Quant, a novel Data-free quantization framework for Video DiTs. Our approach integrates three key innovations: (1) Progressive Bounded Quantization (PBQ) and (2) Auto-scaling Rotated Quantization (ARQ) for calibration data-free quantization error reduction, as well as (3) $delta$-Guided Bit Switching ($delta$-GBS) for adaptive bit-width allocation. Extensive experiments across multiple video generation benchmarks demonstrate that DVD-Quant achieves an approximately 2$ imes$ speedup over full-precision baselines on HunyuanVideo while maintaining visual fidelity. Notably, DVD-Quant is the first to enable W4A4 PTQ for Video DiTs without compromising video quality. Code and models will be available at https://github.com/lhxcs/DVD-Quant.