HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations

📅 2025-06-11

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Diffusion models—particularly Diffusion Transformers—face significant challenges for on-device deployment due to their high memory and computational overhead. Post-training quantization (PTQ) further struggles with activation outliers, limiting its effectiveness for aggressive low-bit quantization (e.g., 4-bit). To address this, we propose HadaNorm: a fine-tuning-free linear preprocessing technique that jointly applies channel-wise mean centering and Hadamard transformation. This method effectively suppresses activation outliers across Transformer blocks, thereby enhancing PTQ robustness. Specifically designed for Diffusion Transformer architectures, HadaNorm consistently reduces quantization error across multiple components—including attention and feed-forward layers—without architectural modification or retraining. Experiments demonstrate superior accuracy-efficiency trade-offs over state-of-the-art methods, enabling, for the first time, practical 4-bit quantization of Diffusion Transformers suitable for efficient on-device inference.

Technology Category

Application Category

📝 Abstract

Diffusion models represent the cutting edge in image generation, but their high memory and computational demands hinder deployment on resource-constrained devices. Post-Training Quantization (PTQ) offers a promising solution by reducing the bitwidth of matrix operations. However, standard PTQ methods struggle with outliers, and achieving higher compression often requires transforming model weights and activations before quantization. In this work, we propose HadaNorm, a novel linear transformation that extends existing approaches and effectively mitigates outliers by normalizing activations feature channels before applying Hadamard transformations, enabling more aggressive activation quantization. We demonstrate that HadaNorm consistently reduces quantization error across the various components of transformer blocks, achieving superior efficiency-performance trade-offs when compared to state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Reducing memory and computational demands of diffusion models

Mitigating outliers in Post-Training Quantization (PTQ) methods

Enabling aggressive activation quantization through HadaNorm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalizes activations before Hadamard transformations

Reduces quantization error in transformer blocks

Enables aggressive activation quantization efficiently

🔎 Similar Papers

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation