🤖 AI Summary
Discrete diffusion models face a “sampling wall” in parallel decoding: categorical sampling collapses the distribution into one-hot vectors, preventing cross-step information propagation. To address this, we propose Loopholing—a mechanism that introduces a deterministic latent pathway to preserve the full distributional information throughout the discrete latent space, thereby avoiding information loss. Integrated with a self-regulating training strategy, we develop the Loopholing Discrete Diffusion Model (LDDM), enabling efficient parallel text generation without complex scheduling. Experiments demonstrate that LDDM reduces language modeling perplexity by up to 61% over baselines and significantly improves accuracy and coherence on reasoning tasks such as Countdown and 24 Game. Notably, LDDM is the first discrete diffusion model to match—or even surpass—autoregressive models in both generation quality and logical reasoning capability.
📝 Abstract
Discrete diffusion models offer a promising alternative to autoregressive generation through parallel decoding, but they suffer from a sampling wall: once categorical sampling occurs, rich distributional information collapses into one-hot vectors and cannot be propagated across steps, forcing subsequent steps to operate with limited information. To mitigate this problem, we introduce Loopholing, a novel and simple mechanism that preserves this information via a deterministic latent pathway, leading to Loopholing Discrete Diffusion Models (LDDMs). Trained efficiently with a self-conditioning strategy, LDDMs achieve substantial gains-reducing generative perplexity by up to 61% over prior baselines, closing (and in some cases surpassing) the gap with autoregressive models, and producing more coherent text. Applied to reasoning tasks, LDDMs also improve performance on arithmetic benchmarks such as Countdown and Game of 24. These results also indicate that loopholing mitigates idle steps and oscillations, providing a scalable path toward high-quality non-autoregressive text generation.