RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion Transformers (DiTs) suffer from prohibitive computational overhead, particularly under ultra-low-bit quantization, where activation sensitivity and distribution complexity render activation quantization the primary bottleneck. To address this, we propose RobuQ—the first systematic low-bit quantization framework tailored for DiTs—comprising RobustQuantizer, a robust activation quantizer, and AMPN, a layer-wise mixed-precision strategy. We theoretically establish that Hadamard transformation normalizes token distributions, enabling dynamic per-layer activation bit allocation. RobuQ integrates quantization-aware training (QAT), ternary weights, Hadamard-assisted activation quantization, and layer-wise mixed-precision optimization. On ImageNet-1K, it achieves the first stable image generation with an average of 2-bit activations, attaining sub-4-bit state-of-the-art performance in both unconditional and conditional generation tasks.

Technology Category

Application Category

📝 Abstract
Diffusion Transformers (DiTs) have recently emerged as a powerful backbone for image generation, demonstrating superior scalability and performance over U-Net architectures. However, their practical deployment is hindered by substantial computational and memory costs. While Quantization-Aware Training (QAT) has shown promise for U-Nets, its application to DiTs faces unique challenges, primarily due to the sensitivity and distributional complexity of activations. In this work, we identify activation quantization as the primary bottleneck for pushing DiTs to extremely low-bit settings. To address this, we propose a systematic QAT framework for DiTs, named RobuQ. We start by establishing a strong ternary weight (W1.58A4) DiT baseline. Building upon this, we propose RobustQuantizer to achieve robust activation quantization. Our theoretical analyses show that the Hadamard transform can convert unknown per-token distributions into per-token normal distributions, providing a strong foundation for this method. Furthermore, we propose AMPN, the first Activation-only Mixed-Precision Network pipeline for DiTs. This method applies ternary weights across the entire network while allocating different activation precisions to each layer to eliminate information bottlenecks. Through extensive experiments on unconditional and conditional image generation, our RobuQ framework achieves state-of-the-art performance for DiT quantization in sub-4-bit quantization configuration. To the best of our knowledge, RobuQ is the first achieving stable and competitive image generation on large datasets like ImageNet-1K with activations quantized to average 2 bits. The code and models will be available at https://github.com/racoonykc/RobuQ .
Problem

Research questions and friction points this paper is trying to address.

Addressing activation quantization bottlenecks in Diffusion Transformers for low-bit deployment
Developing robust quantization methods for sensitive DiT activation distributions
Enabling competitive image generation with sub-4-bit quantized DiT models
Innovation

Methods, ideas, or system contributions that make the work stand out.

RobustQuantizer enables robust activation quantization for DiTs
AMPN pipeline allocates mixed activation precisions per layer
Hadamard transform converts activations to normal distributions
Kaicheng Yang
Kaicheng Yang
DeepGlint
Multimodal、CV、NLP
X
Xun Zhang
Shanghai Jiao Tong University
Haotong Qin
Haotong Qin
ETH Zürich
TinyMLModel CompressionComputer VisionDeep Learning
Y
Yucheng Lin
Shanghai Jiao Tong University
K
Kaisen Yang
Tsinghua University
Xianglong Yan
Xianglong Yan
Shanghai Jiao Tong University
Efficient AI
Y
Yulun Zhang
Shanghai Jiao Tong University