TreeQ: Pushing the Quantization Boundary of Diffusion Transformer via Tree-Structured Mixed-Precision Search

📅 2025-12-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion Transformers (DiTs) suffer from prohibitive computational and memory overhead, hindering practical deployment. While mixed-precision quantization (MPQ) has proven effective for U-Net architectures, its application to DiTs remains underexplored—particularly due to inefficient quantization configuration search, misaligned optimization objectives across layers, and severe information bottlenecks under ultra-low-bit (≤4-bit) quantization. This paper proposes TreeQ, a novel MPQ framework for DiTs: (i) a tree-structured search strategy for efficient precision allocation; (ii) a unified optimization objective guided by environmental noise estimation; and (iii) a generalizable Monarch branching mechanism to mitigate information loss at extremely low bit-widths. TreeQ jointly supports post-training quantization and quantization-aware training. It achieves near-lossless W3A3 and W4A4 quantization on DiT-XL/2—matching full-precision generation quality while substantially reducing compute cost. Code and models are publicly released.

Technology Category

Application Category

📝 Abstract
Diffusion Transformers (DiTs) have emerged as a highly scalable and effective backbone for image generation, outperforming U-Net architectures in both scalability and performance. However, their real-world deployment remains challenging due to high computational and memory demands. Mixed-Precision Quantization (MPQ), designed to push the limits of quantization, has demonstrated remarkable success in advancing U-Net quantization to sub-4bit settings while significantly reducing computational and memory overhead. Nevertheless, its application to DiT architectures remains limited and underexplored. In this work, we propose TreeQ, a unified framework addressing key challenges in DiT quantization. First, to tackle inefficient search and proxy misalignment, we introduce Tree Structured Search (TSS). This DiT-specific approach leverages the architecture's linear properties to traverse the solution space in O(n) time while improving objective accuracy through comparison-based pruning. Second, to unify optimization objectives, we propose Environmental Noise Guidance (ENG), which aligns Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) configurations using a single hyperparameter. Third, to mitigate information bottlenecks in ultra-low-bit regimes, we design the General Monarch Branch (GMB). This structured sparse branch prevents irreversible information loss, enabling finer detail generation. Through extensive experiments, our TreeQ framework demonstrates state-of-the-art performance on DiT-XL/2 under W3A3 and W4A4 PTQ/PEFT settings. Notably, our work is the first to achieve near-lossless 4-bit PTQ performance on DiT models. The code and models will be available at https://github.com/racoonykc/TreeQ
Problem

Research questions and friction points this paper is trying to address.

Efficiently searches DiT quantization space with tree-structured method
Unifies PTQ and QAT objectives via environmental noise guidance
Prevents information loss in ultra-low-bit quantization via sparse branch
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree Structured Search for efficient DiT quantization
Environmental Noise Guidance unifies PTQ and QAT objectives
General Monarch Branch prevents information loss in ultra-low-bit regimes
🔎 Similar Papers
No similar papers found.
Kaicheng Yang
Kaicheng Yang
DeepGlint
Multimodal、CV、NLP
K
Kaisen Yang
Tsinghua University
B
Baiting Wu
Tsinghua University
X
Xun Zhang
Shanghai Jiao Tong University
Q
Qianrui Yang
Tsinghua University
Haotong Qin
Haotong Qin
ETH Zürich
TinyMLModel CompressionComputer VisionDeep Learning
H
He Zhang
Adobe Research
Y
Yulun Zhang
Shanghai Jiao Tong University