🤖 AI Summary
In post-training quantization of diffusion models, quantization errors accumulate across denoising steps, severely degrading generation fidelity. To address this, we propose the first sampling-aware joint quantization framework that explicitly models cross-step error propagation. Our method introduces a multi-step output alignment objective for joint calibration across denoising iterations and incorporates an O(1)-memory gradient optimization strategy—eliminating the O(n) storage overhead of conventional approaches. The entire quantization process is performed via end-to-end post-training, guided by full-precision model outputs. Extensive experiments across diverse diffusion architectures (e.g., DDPM, DDIM, LDM) and benchmarks (e.g., CIFAR-10, CelebA-HQ, LSUN) demonstrate that our approach significantly outperforms stepwise independent quantization: it achieves state-of-the-art compression efficiency (e.g., 4-bit weights/activations) while preserving high-fidelity image generation quality.
📝 Abstract
We present in this paper a novel post-training quantization (PTQ) method, dubbed AccuQuant, for diffusion models. We show analytically and empirically that quantization errors for diffusion models are accumulated over denoising steps in a sampling process. To alleviate the error accumulation problem, AccuQuant minimizes the discrepancies between outputs of a full-precision diffusion model and its quantized version within a couple of denoising steps. That is, it simulates multiple denoising steps of a diffusion sampling process explicitly for quantization, accounting the accumulated errors over multiple denoising steps, which is in contrast to previous approaches to imitating a training process of diffusion models, namely, minimizing the discrepancies independently for each step. We also present an efficient implementation technique for AccuQuant, together with a novel objective, which reduces a memory complexity significantly from $mathcal{O}(n)$ to $mathcal{O}(1)$, where $n$ is the number of denoising steps. We demonstrate the efficacy and efficiency of AccuQuant across various tasks and diffusion models on standard benchmarks.