AdaTSQ: Pushing the Pareto Frontier of Diffusion Transformers via Temporal-Sensitivity Quantization

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational and memory costs of Diffusion Transformers (DiTs) in image and video generation, which hinder their deployment on edge devices. Existing post-training quantization methods overlook the temporal dynamics inherent in the diffusion process, leading to significant performance degradation. To overcome this limitation, we propose AdaTSQ, a novel framework that introduces time sensitivity into DiT quantization for the first time. AdaTSQ formulates bit-width allocation as a constrained path-planning problem and employs beam search to dynamically assign per-layer bit widths at each diffusion timestep based on end-to-end reconstruction error. It further incorporates a Fisher information–based, time-sensitive calibration mechanism to refine weight quantization. Evaluated on models including Flux-Dev, Flux-Schnell, Z-Image, and Wan2.1, AdaTSQ consistently outperforms SVDQuant and ViDiT-Q, achieving Pareto-optimal trade-offs between generation quality and bit efficiency.

Technology Category

Application Category

📝 Abstract
Diffusion Transformers (DiTs) have emerged as the state-of-the-art backbone for high-fidelity image and video generation. However, their massive computational cost and memory footprint hinder deployment on edge devices. While post-training quantization (PTQ) has proven effective for large language models (LLMs), directly applying existing methods to DiTs yields suboptimal results due to the neglect of the unique temporal dynamics inherent in diffusion processes. In this paper, we propose AdaTSQ, a novel PTQ framework that pushes the Pareto frontier of efficiency and quality by exploiting the temporal sensitivity of DiTs. First, we propose a Pareto-aware timestep-dynamic bit-width allocation strategy. We model the quantization policy search as a constrained pathfinding problem. We utilize a beam search algorithm guided by end-to-end reconstruction error to dynamically assign layer-wise bit-widths across different timesteps. Second, we propose a Fisher-guided temporal calibration mechanism. It leverages temporal Fisher information to prioritize calibration data from highly sensitive timesteps, seamlessly integrating with Hessian-based weight optimization. Extensive experiments on four advanced DiTs (e.g., Flux-Dev, Flux-Schnell, Z-Image, and Wan2.1) demonstrate that AdaTSQ significantly outperforms state-of-the-art methods like SVDQuant and ViDiT-Q. Our code will be released at https://github.com/Qiushao-E/AdaTSQ.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Transformers
Post-Training Quantization
Temporal Dynamics
Edge Deployment
Computational Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal-Sensitivity Quantization
Diffusion Transformers
Post-Training Quantization
Pareto-aware Bit-width Allocation
Fisher-guided Calibration