How Efficient Are Diffusion Language Models? A Critical Examination of Efficiency Evaluation Practices

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion language models (DLMs) suffer from significantly lower end-to-end inference throughput than autoregressive (AR) models, yet existing efficiency evaluation methodologies exhibit systemic flaws—neglecting hardware bottlenecks, batch-size scaling effects, and inadequate modeling of decoding parallelism. Method: We establish a comprehensive empirical benchmark across multiple DLMs and hardware platforms, augmented by roofline model–based theoretical throughput analysis to quantify compute utilization bottlenecks across batch sizes. Contribution/Results: Our analysis reveals that acceleration techniques—such as dual-cache scheduling and parallel denoising—deliver diminishing returns beyond small batch sizes, with throughput gains collapsing under large-batch regimes. Crucially, no open-source DLM consistently surpasses AR models in end-to-end throughput. We propose a robust, hardware-aware efficiency evaluation framework for DLMs, providing both theoretical grounding and an empirically validated benchmark to guide co-design of architectures and systems.

Technology Category

Application Category

📝 Abstract
Diffusion language models (DLMs) have emerged as a promising alternative to the long-dominant autoregressive (AR) paradigm, offering a parallelable decoding process that could yield greater efficiency. Yet, in practice, current open-source DLMs often underperform their AR counterparts in speed, limiting their real-world utility. This work presents a systematic study of DLM efficiency, identifying key issues in prior evaluation methods. Through empirical benchmarking and a roofline-based theoretical analysis, we demonstrate that AR models generally achieve higher throughput, while DLMs consistently lag. We also investigate acceleration strategies, finding that techniques like dual cache and parallel decoding mainly offer gains at small batch sizes, with their benefits diminishing upon scaling. Our findings underscore the necessity of robust evaluation methods and improved acceleration strategies to advance research on DLMs.
Problem

Research questions and friction points this paper is trying to address.

Examining efficiency evaluation practices for diffusion language models
Identifying key issues in prior DLM efficiency assessment methods
Analyzing throughput disparities between diffusion and autoregressive models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic benchmarking of diffusion language model efficiency
Roofline-based theoretical analysis of model throughput
Empirical investigation of dual cache acceleration strategies
🔎 Similar Papers
No similar papers found.
H
Han Peng
Gaoling School of Artificial Intelligence, Renmin University of China
P
Peiyu Liu
University of International Business and Economics
Zican Dong
Zican Dong
Renming University of China
NLPlong text modelingLLM
Daixuan Cheng
Daixuan Cheng
Gaoling School of AI, Renmin University of China
LLM Pre-TrainingDomain AdaptationReasoning
J
Junyi Li
Department of Data Science, City University of Hong Kong
Y
Yiru Tang
Gaoling School of Artificial Intelligence, Renmin University of China
S
Shuo Wang
Tsinghua University
Wayne Xin Zhao
Wayne Xin Zhao
Professor, Renmin University of China
Recommender SystemNatural Language ProcessingLarge Language Model