RobuQ: Pushing DiTs to W1.58A2 via Robust Activation Quantization

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Diffusion Transformers (DiTs) suffer from prohibitive computational overhead, particularly under ultra-low-bit quantization, where activation sensitivity and distribution complexity render activation quantization the primary bottleneck. To address this, we propose RobuQ—the first systematic low-bit quantization framework tailored for DiTs—comprising RobustQuantizer, a robust activation quantizer, and AMPN, a layer-wise mixed-precision strategy. We theoretically establish that Hadamard transformation normalizes token distributions, enabling dynamic per-layer activation bit allocation. RobuQ integrates quantization-aware training (QAT), ternary weights, Hadamard-assisted activation quantization, and layer-wise mixed-precision optimization. On ImageNet-1K, it achieves the first stable image generation with an average of 2-bit activations, attaining sub-4-bit state-of-the-art performance in both unconditional and conditional generation tasks.

Technology Category

Application Category

📝 Abstract

Diffusion Transformers (DiTs) have recently emerged as a powerful backbone for image generation, demonstrating superior scalability and performance over U-Net architectures. However, their practical deployment is hindered by substantial computational and memory costs. While Quantization-Aware Training (QAT) has shown promise for U-Nets, its application to DiTs faces unique challenges, primarily due to the sensitivity and distributional complexity of activations. In this work, we identify activation quantization as the primary bottleneck for pushing DiTs to extremely low-bit settings. To address this, we propose a systematic QAT framework for DiTs, named RobuQ. We start by establishing a strong ternary weight (W1.58A4) DiT baseline. Building upon this, we propose RobustQuantizer to achieve robust activation quantization. Our theoretical analyses show that the Hadamard transform can convert unknown per-token distributions into per-token normal distributions, providing a strong foundation for this method. Furthermore, we propose AMPN, the first Activation-only Mixed-Precision Network pipeline for DiTs. This method applies ternary weights across the entire network while allocating different activation precisions to each layer to eliminate information bottlenecks. Through extensive experiments on unconditional and conditional image generation, our RobuQ framework achieves state-of-the-art performance for DiT quantization in sub-4-bit quantization configuration. To the best of our knowledge, RobuQ is the first achieving stable and competitive image generation on large datasets like ImageNet-1K with activations quantized to average 2 bits. The code and models will be available at https://github.com/racoonykc/RobuQ .

Problem

Research questions and friction points this paper is trying to address.

Addressing activation quantization bottlenecks in Diffusion Transformers for low-bit deployment

Developing robust quantization methods for sensitive DiT activation distributions

Enabling competitive image generation with sub-4-bit quantized DiT models

Innovation

Methods, ideas, or system contributions that make the work stand out.

RobustQuantizer enables robust activation quantization for DiTs

AMPN pipeline allocates mixed activation precisions per layer

Hadamard transform converts activations to normal distributions

🔎 Similar Papers

DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers