LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Text-to-Image Generation

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

To address severe performance degradation in post-training quantization (PTQ) of diffusion transformers (DiTs) at ultra-low bit-widths—caused by heavy-tailed weight distributions and activation outliers—this paper proposes the first DiT-specific efficient low-bit quantization framework. Methodologically: (1) Twin-Logarithmic Quantization (TLQ) is introduced to accurately model the quasi-Gaussian heavy-tailed weight distribution; (2) an Adaptive Rotation Scheme (ARS) is designed, integrating Hadamard transformation with outlier-aware rotation to dynamically suppress both mild and extreme activation outliers. Extensive evaluation on PixArt-α and FLUX models across COCO, MJHQ, and sDCI benchmarks demonstrates substantial gains over state-of-the-art PTQ methods. The framework maintains high-fidelity image generation quality even at 2–3 bits, enabling, for the first time, stable and efficient deployment of DiTs under extreme low-bit settings.

Technology Category

Application Category

📝 Abstract

Diffusion Transformers (DiTs) have achieved impressive performance in text-to-image generation. However, their high computational cost and large parameter sizes pose significant challenges for usage in resource-constrained scenarios. Post-training quantization (PTQ) is a promising solution to reduce memory usage and accelerate inference, but existing PTQ methods suffer from severe performance degradation under extreme low-bit settings. We identify two key obstacles to low-bit post-training quantization for DiT models: (1) model weights follow a Gaussian-like distribution with long tails, causing uniform quantization to poorly allocate intervals and leading to significant errors; (2) two types of activation outliers: (i) Mild Outliers with slightly elevated values, and (ii) Salient Outliers with large magnitudes concentrated in specific channels, which disrupt activation quantization. To address these issues, we propose LRQ-DiT, an efficient and accurate PTQ framework. We introduce Twin-Log Quantization (TLQ), a log-based method that aligns well with the weight distribution and reduces quantization errors. We also propose an Adaptive Rotation Scheme (ARS) that dynamically applies Hadamard or outlier-aware rotations based on activation fluctuation, effectively mitigating the impact of both types of outliers. We evaluate LRQ-DiT on PixArt and FLUX under various bit-width settings, and validate the performance on COCO, MJHQ, and sDCI datasets. LRQ-DiT achieves low-bit quantization of DiT models while preserving image quality, outperforming existing PTQ baselines.

Problem

Research questions and friction points this paper is trying to address.

Quantize Diffusion Transformers efficiently under low-bit settings

Address weight distribution and activation outliers in DiTs

Preserve image quality in text-to-image generation post-quantization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Log-based Twin-Log Quantization for weight alignment

Adaptive Rotation Scheme for outlier mitigation

Efficient low-bit post-training quantization framework

🔎 Similar Papers

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation