Unifying Masked Diffusion Models with Various Generation Orders and Beyond

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing Masked Diffusion Language Models (MDMs), whose performance is constrained by predefined or staged generation orders that are suboptimal and incur additional computational overhead. To overcome this, the authors propose OeMDM, a unified framework that, for the first time, integrates autoregressive, block-wise diffusion, and masked diffusion paradigms under a single perspective. Building upon this framework, they introduce LoMDM, a model that jointly optimizes a learnable, context-aware generation order alongside the diffusion backbone within an end-to-end training scheme targeting a single objective. Experimental results demonstrate that LoMDM significantly outperforms current discrete diffusion approaches across multiple language modeling benchmarks, thereby validating its effectiveness and superiority.

Technology Category

Application Category

📝 Abstract
Masked diffusion models (MDMs) are a potential alternative to autoregressive models (ARMs) for language generation, but generation quality depends critically on the generation order. Prior work either hard-codes an ordering (e.g., blockwise left-to-right) or learns an ordering policy for a pretrained MDM, which incurs extra cost and can yield suboptimal solutions due to the two-stage optimization. Motivated by this, we propose order-expressive masked diffusion model (OeMDM) for a broad class of diffusion generative processes with various generation orders, enabling the interpretation of MDM, ARM, and block diffusion in a single framework. Furthermore, building on OeMDM, we introduce learnable-order masked diffusion model (LoMDM), which jointly learns the generation ordering and diffusion backbone through a single objective from scratch, enabling the diffusion model to generate text in context-dependent ordering. Empirically, we confirm that LoMDM outperforms various discrete diffusion models across multiple language modeling benchmarks.
Problem

Research questions and friction points this paper is trying to address.

masked diffusion models
generation order
language generation
autoregressive models
discrete diffusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

masked diffusion models
generation order
learnable ordering
unified framework
language generation
🔎 Similar Papers