DVD-Quant: Data-free Video Diffusion Transformers Quantization

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video diffusion Transformers (Video DiTs) face significant deployment challenges due to prohibitive computational and memory overheads; existing post-training quantization (PTQ) methods require large calibration datasets and suffer substantial accuracy degradation. To address this, we propose the first training-data-free PTQ framework for Video DiTs, featuring three key innovations: (1) Progressive Bounded Quantization, which gradually tightens quantization bounds to preserve representational fidelity; (2) Auto-scaling Rotated Quantization, which applies orthogonal rotation and dynamic scale adjustment to mitigate quantization error accumulation; and (3) δ-Guided Bit Switching, which adaptively assigns bit-widths (e.g., 4-bit weights and activations) based on per-layer sensitivity estimates. Our framework achieves lossless video generation quality under W4A4 quantization—demonstrated on HunyuanVideo—with a 2× inference speedup, thereby overcoming a critical bottleneck in efficient Video DiT deployment.

Technology Category

Application Category

📝 Abstract
Diffusion Transformers (DiTs) have emerged as the state-of-the-art architecture for video generation, yet their computational and memory demands hinder practical deployment. While post-training quantization (PTQ) presents a promising approach to accelerate Video DiT models, existing methods suffer from two critical limitations: (1) dependence on lengthy, computation-heavy calibration procedures, and (2) considerable performance deterioration after quantization. To address these challenges, we propose DVD-Quant, a novel Data-free quantization framework for Video DiTs. Our approach integrates three key innovations: (1) Progressive Bounded Quantization (PBQ) and (2) Auto-scaling Rotated Quantization (ARQ) for calibration data-free quantization error reduction, as well as (3) $delta$-Guided Bit Switching ($delta$-GBS) for adaptive bit-width allocation. Extensive experiments across multiple video generation benchmarks demonstrate that DVD-Quant achieves an approximately 2$ imes$ speedup over full-precision baselines on HunyuanVideo while maintaining visual fidelity. Notably, DVD-Quant is the first to enable W4A4 PTQ for Video DiTs without compromising video quality. Code and models will be available at https://github.com/lhxcs/DVD-Quant.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational and memory demands of Video DiTs
Eliminating dependence on heavy calibration in quantization
Minimizing performance loss in post-training quantization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Bounded Quantization for error reduction
Auto-scaling Rotated Quantization for calibration-free
Delta-Guided Bit Switching for adaptive allocation
Zhiteng Li
Zhiteng Li
Shanghai Jiao Tong University
Large Language ModelsModel CompressionComputer Vision
H
Hanxuan Li
Zhejiang University
J
Junyi Wu
Shanghai Jiao Tong University
K
Kai Liu
Shanghai Jiao Tong University
Linghe Kong
Linghe Kong
Shanghai Jiao Tong University
Internet of ThingsMobile computingBig data
Guihai Chen
Guihai Chen
Professor of Computer Science
Computer Science and Technology
Y
Yulun Zhang
Shanghai Jiao Tong University
X
Xiaokang Yang
Shanghai Jiao Tong University