Temporal Feature Matters: A Framework for Diffusion Model Quantization

📅 2024-07-28
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models suffer from high computational latency and memory overhead during inference; existing post-training quantization (PTQ) methods overlook their timestep-sensitive nature, leading to distorted denoising trajectories and degraded quantization accuracy. To address this, we propose the first timestep-aware PTQ framework: (i) a Timestep Information Block (TIB) explicitly models temporal dependencies; (ii) Timestep-Aware Reconstruction (TIAR) and Finite-Set Calibration (FSC) jointly optimize quantization parameters; and (iii) a cache-based timestep feature maintenance mechanism coupled with a perturbation-driven hybrid strategy selection ensures robust temporal fidelity. Extensive experiments across diverse diffusion architectures (e.g., DDPM, DDIM), datasets (CIFAR-10, ImageNet), and hardware platforms (GPU, edge accelerators) demonstrate substantial improvements in both quantization accuracy (+3.2–5.7 dB PSNR) and inference throughput (+2.1–3.8× speedup), while preserving timestep-specific dynamics. End-to-end generation quality closely matches floating-point baselines, achieving state-of-the-art temporal feature fidelity under 4-bit quantization.

Technology Category

Application Category

📝 Abstract
The Diffusion models, widely used for image generation, face significant challenges related to their broad applicability due to prolonged inference times and high memory demands. Efficient Post-Training Quantization (PTQ) is crucial to address these issues. However, unlike traditional models, diffusion models critically rely on the time-step for the multi-round denoising. Typically, each time-step is encoded into a hypersensitive temporal feature by several modules. Despite this, existing PTQ methods do not optimize these modules individually. Instead, they employ unsuitable reconstruction objectives and complex calibration methods, leading to significant disturbances in the temporal feature and denoising trajectory, as well as reduced compression efficiency. To address these challenges, we introduce a novel quantization framework that includes three strategies: 1) TIB-based Maintenance: Based on our innovative Temporal Information Block (TIB) definition, Temporal Information-aware Reconstruction (TIAR) and Finite Set Calibration (FSC) are developed to efficiently align original temporal features. 2) Cache-based Maintenance: Instead of indirect and complex optimization for the related modules, pre-computing and caching quantized counterparts of temporal features are developed to minimize errors. 3) Disturbance-aware Selection: Employ temporal feature errors to guide a fine-grained selection between the two maintenance strategies for further disturbance reduction. This framework preserves most of the temporal information and ensures high-quality end-to-end generation. Extensive testing on various datasets, diffusion models and hardware confirms our superior performance and acceleration.
Problem

Research questions and friction points this paper is trying to address.

Address prolonged inference times in diffusion models
Reduce high memory demands in image generation
Optimize temporal feature preservation in quantization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Information Block (TIB) for feature alignment
Cache-based maintenance to minimize quantization errors
Disturbance-aware selection for strategy optimization
🔎 Similar Papers
No similar papers found.
Yushi Huang
Yushi Huang
Hong Kong University of Science and Technology
Efficient AI
R
Ruihao Gong
SenseTime Research, China
R
Ruihao Gong
State Key Laboratory of Software Development Environment, Beihang University, China
X
Xianglong Liu
State Key Laboratory of Software Development Environment, Beihang University, China
J
Jing Liu
Department of Data Science and AI, Faculty of IT, Monash University, Australia
Yuhang Li
Yuhang Li
Yale University
Machine Learning
J
Jiwen Lu
Department of Automation, Tsinghua University, China
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining