On the Role of Discreteness in Diffusion LLMs

📅 2025-12-27

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Diffusion models face a fundamental tension when applied to language generation: the discrete and structured nature of text undermines existing approaches—uniform noise scheduling ignores positional information distribution, while token-level marginal training fails to capture multi-token dependencies. Method: We first formalize five necessary properties for diffusion-based language modeling and theoretically prove that both continuous embedding diffusion and discrete token diffusion satisfy only a subset of these properties, revealing an inherent trade-off. Through theoretical analysis of diffusion processes, decoupling of linguistic modeling characteristics, empirical diagnosis on large-scale diffusion LLMs, and attribution-based comparative analysis, we systematically identify the root cause: structural mismatch between diffusion dynamics and linguistic constraints. Contribution/Results: This work establishes the first theoretical framework for building semantically coherent and structurally consistent diffusion language models, and proposes a new paradigm that jointly optimizes parallel decoding efficiency and linguistic structural fidelity.

Technology Category

Application Category

📝 Abstract

Diffusion models offer appealing properties for language generation, such as parallel decoding and iterative refinement, but the discrete and highly structured nature of text challenges the direct application of diffusion principles. In this paper, we revisit diffusion language modeling from the view of diffusion process and language modeling, and outline five properties that separate diffusion mechanics from language-specific requirements. We first categorize existing approaches into continuous diffusion in embedding space and discrete diffusion over tokens. We then show that each satisfies only part of the five essential properties and therefore reflects a structural trade-off. Through analyses of recent large diffusion language models, we identify two central issues: (i) uniform corruption does not respect how information is distributed across positions, and (ii) token-wise marginal training cannot capture multi-token dependencies during parallel decoding. These observations motivate diffusion processes that align more closely with the structure of text, and encourage future work toward more coherent diffusion language models.

Problem

Research questions and friction points this paper is trying to address.

Addresses structural mismatch between diffusion models and discrete text generation

Identifies limitations of uniform corruption in token information distribution

Highlights token-wise training's failure to capture multi-token dependencies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous diffusion in embedding space

Discrete diffusion over tokens

Align diffusion processes with text structure

🔎 Similar Papers

No similar papers found.

Authors to Follow