DVD-Quant: Data-free Video Diffusion Transformers Quantization

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Video diffusion Transformers (Video DiTs) face significant deployment challenges due to prohibitive computational and memory overheads; existing post-training quantization (PTQ) methods require large calibration datasets and suffer substantial accuracy degradation. To address this, we propose the first training-data-free PTQ framework for Video DiTs, featuring three key innovations: (1) Progressive Bounded Quantization, which gradually tightens quantization bounds to preserve representational fidelity; (2) Auto-scaling Rotated Quantization, which applies orthogonal rotation and dynamic scale adjustment to mitigate quantization error accumulation; and (3) δ-Guided Bit Switching, which adaptively assigns bit-widths (e.g., 4-bit weights and activations) based on per-layer sensitivity estimates. Our framework achieves lossless video generation quality under W4A4 quantization—demonstrated on HunyuanVideo—with a 2× inference speedup, thereby overcoming a critical bottleneck in efficient Video DiT deployment.

Technology Category

Application Category

📝 Abstract

Diffusion Transformers (DiTs) have emerged as the state-of-the-art architecture for video generation, yet their computational and memory demands hinder practical deployment. While post-training quantization (PTQ) presents a promising approach to accelerate Video DiT models, existing methods suffer from two critical limitations: (1) dependence on lengthy, computation-heavy calibration procedures, and (2) considerable performance deterioration after quantization. To address these challenges, we propose DVD-Quant, a novel Data-free quantization framework for Video DiTs. Our approach integrates three key innovations: (1) Progressive Bounded Quantization (PBQ) and (2) Auto-scaling Rotated Quantization (ARQ) for calibration data-free quantization error reduction, as well as (3) $delta$-Guided Bit Switching ($delta$-GBS) for adaptive bit-width allocation. Extensive experiments across multiple video generation benchmarks demonstrate that DVD-Quant achieves an approximately 2$ imes$ speedup over full-precision baselines on HunyuanVideo while maintaining visual fidelity. Notably, DVD-Quant is the first to enable W4A4 PTQ for Video DiTs without compromising video quality. Code and models will be available at https://github.com/lhxcs/DVD-Quant.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational and memory demands of Video DiTs

Eliminating dependence on heavy calibration in quantization

Minimizing performance loss in post-training quantization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Bounded Quantization for error reduction

Auto-scaling Rotated Quantization for calibration-free

Delta-Guided Bit Switching for adaptive allocation

🔎 Similar Papers

MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion

2024-10-10arXiv.orgCitations: 0

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

2024-06-04arXiv.orgCitations: 7

Pyramidal Flow Matching for Efficient Video Generative Modeling

2024-10-08arXiv.orgCitations: 31

Authors to Follow