On the Complexity Theory of Masked Discrete Diffusion: From $mathrm{poly}(1/ε)$ to Nearly $ε$-Free

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Prior theoretical analyses of masked discrete diffusion for high-dimensional text generation are insufficient—existing works either neglect mainstream Euler samplers or rely on bounded score assumptions, failing to reveal masked diffusion’s advantages over uniform diffusion. Method: We propose Mask-Aware Truncated Uniformization (MATU), the first unbiased estimator for masked discrete diffusion that operates without bounded score assumptions and fully exploits the structural constraint “each token is unmasked at most once.” Results: Under total variation distance, standard Euler sampling incurs complexity $ ilde{O}(d^2 varepsilon^{-3/2})$, whereas MATU achieves $O(d ln d cdot (1 - varepsilon^2))$. Crucially, MATU eliminates the $ln(1/varepsilon)$ factor, significantly accelerating convergence and establishing the theoretical superiority of masked diffusion over uniform alternatives.

Technology Category

Application Category

📝 Abstract

We study masked discrete diffusion -- a flexible paradigm for text generation in which tokens are progressively corrupted by special mask symbols before being denoised. Although this approach has demonstrated strong empirical performance, its theoretical complexity in high-dimensional settings remains insufficiently understood. Existing analyses largely focus on uniform discrete diffusion, and more recent attempts addressing masked diffusion either (1) overlook widely used Euler samplers, (2) impose restrictive bounded-score assumptions, or (3) fail to showcase the advantages of masked discrete diffusion over its uniform counterpart. To address this gap, we show that Euler samplers can achieve $ε$-accuracy in total variation (TV) with $ ilde{O}(d^{2}ε^{-3/2})$ discrete score evaluations, thereby providing the first rigorous analysis of typical Euler sampler in masked discrete diffusion. We then propose a Mask-Aware Truncated Uniformization (MATU) approach that both removes bounded-score assumptions and preserves unbiased discrete score approximation. By exploiting the property that each token can be unmasked at most once, MATU attains a nearly $ε$-free complexity of $O(d,ln dcdot (1-ε^2))$. This result surpasses existing uniformization methods under uniform discrete diffusion, eliminating the $ln(1/ε)$ factor and substantially speeding up convergence. Our findings not only provide a rigorous theoretical foundation for masked discrete diffusion, showcasing its practical advantages over uniform diffusion for text generation, but also pave the way for future efforts to analyze diffusion-based language models developed under masking paradigm.

Problem

Research questions and friction points this paper is trying to address.

Analyzing theoretical complexity of masked discrete diffusion for text generation

Addressing limitations in existing analyses of Euler samplers for masked diffusion

Developing efficient methods to eliminate exponential dependency on accuracy parameter

Innovation

Methods, ideas, or system contributions that make the work stand out.

Euler samplers achieve accuracy with discrete score evaluations

Mask-Aware Truncated Uniformization removes bounded-score assumptions

MATU attains nearly epsilon-free complexity for faster convergence

🔎 Similar Papers

Simplified and Generalized Masked Diffusion for Discrete Data