🤖 AI Summary
This paper investigates the non-asymptotic convergence of discrete diffusion models (DDMs) on finite discrete spaces $mathbb{Z}_m^d$ and countably infinite spaces $mathbb{N}^d$. Addressing two forward dynamics—masking and random walk—we establish, for the first time without assuming boundedness of the discrete score function, an error-bound framework grounded in the monotonicity of the discrete score. Our theoretical guarantees apply broadly across diverse discrete noise mechanisms and yield convergence rates that scale linearly (up to logarithmic factors) with dimension, markedly improving high-dimensional scalability. The key contribution is a departure from continuous-diffusion theory: we provide the first unified, practical, and scalable non-asymptotic convergence guarantee specifically tailored for DDMs operating on discrete state spaces.
📝 Abstract
We investigate the theoretical underpinnings of Discrete Diffusion Models (DDMs) on discrete state spaces. Unlike in the continuous setting-where diffusion models are well understood both theoretically and empirically-the discrete case poses significant challenges due to its combinatorial structure and the lack of rigorous analysis. In this work, we establish convergence guarantees for DDMs on both the finite space $mathbb{Z}^d_m={0,...,m-1}^d$ and the countably infinite space $mathbb{N}^d$ under mild assumptions, focusing on forward masked and random walk dynamics. Similar to the continuous case, the backward process can be characterized by a discrete score function, whose monotonicity plays a central role in deriving the error bounds of the generated data. Notably, the complexity of our model scales linearly up to logarithmic factors, rather than exponentially, with the dimension, making it efficiently scalable to high-dimensional data. To the best of our knowledge, this study provides the first non-asymptotic convergence guarantees that do not rely on the boundedness of the estimated score-covering not only uniform noising processes on $mathbb{Z}^d_m$ and on $mathbb{N}^d$, but also masking-based noising dynamics.