Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Diffusion large language models (dLLMs) pose significant challenges for efficient deployment on resource-constrained edge devices due to their massive parameter counts and activation tensors containing prominent outliers. This work presents the first systematic study of post-training quantization (PTQ) for dLLMs, identifying outlier-dominated activation dynamic ranges as the primary bottleneck for accuracy degradation under low-bit quantization. We conduct comprehensive empirical evaluations across multiple tasks, model variants, bitwidths, quantization schemes, task types, and architectural configurations. Based on these findings, we propose targeted PTQ optimizations specifically tailored to dLLM characteristics. Experimental results demonstrate that our approach substantially improves quantization accuracy at 4–6 bits—achieving performance close to full-precision in several configurations. To foster reproducibility and adoption, we publicly release all code and configuration files. This work provides both theoretical insights and practical guidelines for lightweight deployment of non-autoregressive generative models.

Technology Category

Application Category

📝 Abstract

Recent advances in diffusion large language models (dLLMs) have introduced a promising alternative to autoregressive (AR) LLMs for natural language generation tasks, leveraging full attention and denoising-based decoding strategies. However, the deployment of these models on edge devices remains challenging due to their massive parameter scale and high resource demands. While post-training quantization (PTQ) has emerged as a widely adopted technique for compressing AR LLMs, its applicability to dLLMs remains largely unexplored. In this work, we present the first systematic study on quantizing diffusion-based language models. We begin by identifying the presence of activation outliers, characterized by abnormally large activation values that dominate the dynamic range. These outliers pose a key challenge to low-bit quantization, as they make it difficult to preserve precision for the majority of values. More importantly, we implement state-of-the-art PTQ methods and conduct a comprehensive evaluation across multiple task types and model variants. Our analysis is structured along four key dimensions: bit-width, quantization method, task category, and model type. Through this multi-perspective evaluation, we offer practical insights into the quantization behavior of dLLMs under different configurations. We hope our findings provide a foundation for future research in efficient dLLM deployment. All codes and experimental setups will be released to support the community.

Problem

Research questions and friction points this paper is trying to address.

Studying quantization challenges for diffusion large language models

Identifying activation outliers that hinder low-bit precision

Evaluating PTQ methods across tasks and model variants

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic study of post-training quantization

Identifying activation outliers for precision preservation

Comprehensive multi-perspective evaluation framework

🔎 Similar Papers

ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation