On the Reasoning Abilities of Masked Diffusion Language Models

πŸ“… 2025-10-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work investigates the reasoning capability boundaries and parallel generation efficiency of Masked Diffusion Models (MDMs) for structured reasoning tasks. Method: We establish, for the first time, theoretical equivalences between MDMs and both Chain-of-Thought (CoT)-enhanced Transformers and fill-in-the-blank recurrent Transformers, under a finite-precision logarithmic-width computational model. Contribution/Results: We prove that MDMs can solve all problems solvable by CoT-Transformers and exhibit intrinsic parallel advantages on specific formal tasksβ€”e.g., regular language recognition. Empirical evaluation confirms that MDMs achieve significantly improved reasoning efficiency while preserving full parallelism. This is the first formal characterization of the reasoning complexity of MDMs, revealing their capacity for parallel reasoning beyond standard autoregressive models.

Technology Category

Application Category

πŸ“ Abstract
Masked diffusion models (MDMs) for text offer a compelling alternative to traditional autoregressive language models. Parallel generation makes them efficient, but their computational capabilities and the limitations inherent to their parallelism remain largely unexplored. To this end, we characterize what types of reasoning problems MDMs can provably solve and how efficiently. We do this by connecting MDMs to the well-understood reasoning frameworks of chain of thought (CoT) and padded looped transformers (PLTs) in the finite-precision log-width setting: We show that MDMs and polynomially-padded PLTs are, in fact, equivalent in this setting, and that MDMs can solve all problems that CoT-augmented transformers can. Moreover, we showcase classes of problems (including regular languages) for which MDMs are inherently more efficient than CoT transformers, where parallel generation allows for substantially faster reasoning.
Problem

Research questions and friction points this paper is trying to address.

Characterize reasoning problems solvable by masked diffusion models
Establish equivalence between MDMs and polynomially-padded looped transformers
Identify problems where MDMs outperform CoT transformers in efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

MDMs equivalent to polynomially-padded looped transformers
MDMs solve all CoT-augmented transformer problems
Parallel generation enables faster reasoning for MDMs
πŸ”Ž Similar Papers
No similar papers found.