Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the limitations of existing convergence theories for discrete diffusion models, which fail under singular priors—such as mask distributions—and yield total variation bounds that scale with the state space size $S$, hindering applicability to large-vocabulary settings. To overcome these issues, the authors propose a unified analytical framework based on adjoint equations, establishing dimension-independent convergence guarantees under arbitrary integral probability metrics. The approach accommodates both mask and uniform priors, as well as time-varying schedules. Key technical contributions include analysis in the observable function space, derivation of rate matrix regularity conditions, coupling arguments, and a novel score–marginal cancellation mechanism. This study provides the first convergence theory for discrete diffusion models that entirely eliminates dependence on $S$, thereby surpassing the limitations of methods based on KL divergence or total variation distance, and introduces a general-purpose toolkit for future theoretical analyses.

📝 Abstract

Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and is compatible with time-inhomogeneous schedules. Four novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and a score-marginal cancellation technique that removes $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models.

Problem

Research questions and friction points this paper is trying to address.

discrete diffusion models

convergence

dimension-free

integral probability metric

singular priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

adjoint equations

dimension-free convergence

discrete diffusion models