LRQ-DiT: Log-Rotation Post-Training Quantization of Diffusion Transformers for Text-to-Image Generation

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address severe performance degradation in post-training quantization (PTQ) of diffusion transformers (DiTs) at ultra-low bit-widths—caused by heavy-tailed weight distributions and activation outliers—this paper proposes the first DiT-specific efficient low-bit quantization framework. Methodologically: (1) Twin-Logarithmic Quantization (TLQ) is introduced to accurately model the quasi-Gaussian heavy-tailed weight distribution; (2) an Adaptive Rotation Scheme (ARS) is designed, integrating Hadamard transformation with outlier-aware rotation to dynamically suppress both mild and extreme activation outliers. Extensive evaluation on PixArt-α and FLUX models across COCO, MJHQ, and sDCI benchmarks demonstrates substantial gains over state-of-the-art PTQ methods. The framework maintains high-fidelity image generation quality even at 2–3 bits, enabling, for the first time, stable and efficient deployment of DiTs under extreme low-bit settings.

Technology Category

Application Category

📝 Abstract
Diffusion Transformers (DiTs) have achieved impressive performance in text-to-image generation. However, their high computational cost and large parameter sizes pose significant challenges for usage in resource-constrained scenarios. Post-training quantization (PTQ) is a promising solution to reduce memory usage and accelerate inference, but existing PTQ methods suffer from severe performance degradation under extreme low-bit settings. We identify two key obstacles to low-bit post-training quantization for DiT models: (1) model weights follow a Gaussian-like distribution with long tails, causing uniform quantization to poorly allocate intervals and leading to significant errors; (2) two types of activation outliers: (i) Mild Outliers with slightly elevated values, and (ii) Salient Outliers with large magnitudes concentrated in specific channels, which disrupt activation quantization. To address these issues, we propose LRQ-DiT, an efficient and accurate PTQ framework. We introduce Twin-Log Quantization (TLQ), a log-based method that aligns well with the weight distribution and reduces quantization errors. We also propose an Adaptive Rotation Scheme (ARS) that dynamically applies Hadamard or outlier-aware rotations based on activation fluctuation, effectively mitigating the impact of both types of outliers. We evaluate LRQ-DiT on PixArt and FLUX under various bit-width settings, and validate the performance on COCO, MJHQ, and sDCI datasets. LRQ-DiT achieves low-bit quantization of DiT models while preserving image quality, outperforming existing PTQ baselines.
Problem

Research questions and friction points this paper is trying to address.

Quantize Diffusion Transformers efficiently under low-bit settings
Address weight distribution and activation outliers in DiTs
Preserve image quality in text-to-image generation post-quantization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Log-based Twin-Log Quantization for weight alignment
Adaptive Rotation Scheme for outlier mitigation
Efficient low-bit post-training quantization framework
L
Lianwei Yang
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Haokun Lin
Haokun Lin
City University of Hong Kong & CASIA
Multi-modal LearningEfficient Deep Learning
T
Tianchen Zhao
Department of Electronic Engineering, Tsinghua University
Yichen Wu
Yichen Wu
Harvard University |CityU-HK| XJTU
Continual LearningTransfer LearningLLM EditingMedical Image Analysis
H
Hongyu Zhu
Department of Electronic Engineering, Tsinghua University
R
Ruiqi Xie
Department of Electronic Engineering, Tsinghua University
Zhenan Sun
Zhenan Sun
Institute of Automation, Chinese Academy of Sciences
BiometricsPattern RecognitionComputer Vision
Y
Yu Wang
Department of Electronic Engineering, Tsinghua University
Qingyi Gu
Qingyi Gu
Institute of Automation, Chinese Academy of Sciences
High-speed visioncell analysis