Discrete Diffusion in Large Language and Multimodal Models: A Survey

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of autoregressive language models—including poor parallelizability, weak fine-grained controllability, and inadequate response awareness—this survey systematically reviews recent advances in discrete diffusion language models (dLLMs) and multimodal diffusion models (dMLLMs). We introduce a unified mathematical framework that clarifies their historical development and establishes a principled taxonomy. Our core paradigm centers on full-attention-driven, multi-token parallel denoising generation, integrating discrete probabilistic modeling, token-level noise scheduling, multi-stage training, and cross-modal alignment. The survey encompasses over 100 open-source and industrial models, demonstrating that dLLMs/dMLLMs match or approach autoregressive models’ performance across language, vision-language, and biological sequence tasks—while achieving up to 10× inference speedup, significantly enhanced output controllability, and improved dynamic response capability.

Technology Category

Application Category

📝 Abstract
In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed. The advancement of discrete diffusion LLMs and MLLMs has been largely driven by progress in two domains. The first is the development of autoregressive LLMs and MLLMs, which has accumulated vast amounts of data, benchmarks, and foundational infrastructure for training and inference. The second contributing domain is the evolution of the mathematical models underlying discrete diffusion. Together, these advancements have catalyzed a surge in dLLMs and dMLLMs research in early 2025. In this work, we present a comprehensive overview of the research in the dLLM and dMLLM domains. We trace the historical development of dLLMs and dMLLMs, formalize the underlying mathematical frameworks, and categorize representative models. We further analyze key techniques for training and inference, and summarize emerging applications across language, vision-language, and biological domains. We conclude by discussing future directions for research and deployment. Paper collection: https://github.com/LiQiiiii/DLLM-Survey
Problem

Research questions and friction points this paper is trying to address.

Survey discrete diffusion models in language and multimodal tasks
Compare discrete diffusion models with autoregressive models
Analyze training, inference, and applications of diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel decoding with full attention
Denoising-based generation strategy
Dynamic response-aware perception
🔎 Similar Papers
No similar papers found.