FP4DiT: Towards Effective Floating Point Quantization for Diffusion Transformers

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational and memory overhead of Diffusion Transformers (DiTs) on edge devices, this work proposes the first post-training floating-point quantization (FPQ) framework tailored for DiT architectures. Our method introduces three key innovations: (1) the first application of floating-point quantization to DiT compression; (2) an adaptive rounding weight quantization scheme to mitigate weight distribution shift under low-bit precision; and (3) an input-block-wise online activation calibration strategy to improve activation distribution alignment. Evaluated under a hardware-friendly W4A6 configuration, our approach significantly outperforms INT4/8 baselines on PixArt-α/Σ and Hunyuan models, achieving +1.8 and +2.3 improvements in HPSv2 and CLIP scores, respectively. The method enables high-fidelity text-to-image generation on resource-constrained edge platforms.

Technology Category

Application Category

📝 Abstract
Diffusion Models (DM) have revolutionized the text-to-image visual generation process. However, the large computational cost and model footprint of DMs hinders practical deployment, especially on edge devices. Post-training quantization (PTQ) is a lightweight method to alleviate these burdens without the need for training or fine-tuning. While recent DM PTQ methods achieve W4A8 on integer-based PTQ, two key limitations remain: First, while most existing DM PTQ methods evaluate on classical DMs like Stable Diffusion XL, 1.5 or earlier, which use convolutional U-Nets, newer Diffusion Transformer (DiT) models like the PixArt series, Hunyuan and others adopt fundamentally different transformer backbones to achieve superior image synthesis. Second, integer (INT) quantization is prevailing in DM PTQ but doesn't align well with the network weight and activation distribution, while Floating-Point Quantization (FPQ) is still under-investigated, yet it holds the potential to better align the weight and activation distributions in low-bit settings for DiT. In response, we introduce FP4DiT, a PTQ method that leverages FPQ to achieve W4A6 quantization. Specifically, we extend and generalize the Adaptive Rounding PTQ technique to adequately calibrate weight quantization for FPQ and demonstrate that DiT activations depend on input patch data, necessitating robust online activation quantization techniques. Experimental results demonstrate that FP4DiT outperforms integer-based PTQ at W4A6 and W4A8 precision and generates convincing visual content on PixArt-$alpha$, PixArt-$Sigma$ and Hunyuan in terms of several T2I metrics such as HPSv2 and CLIP.
Problem

Research questions and friction points this paper is trying to address.

Reduces computational cost and model size of Diffusion Transformers.
Introduces Floating-Point Quantization for better weight and activation alignment.
Improves image synthesis quality on newer Diffusion Transformer models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

FP4DiT uses Floating-Point Quantization for Diffusion Transformers.
Adaptive Rounding PTQ technique calibrates weight quantization.
Online activation quantization handles input patch data.
🔎 Similar Papers
No similar papers found.