The Diffusion Duality, Chapter II: $\Psi$-Samplers and Efficient Curriculum

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel approach to discrete diffusion models by introducing the general-purpose Predictor-Corrector (PC) sampling framework, challenging the prevailing assumption that Masked diffusion dominates language modeling. By integrating Gaussian-relaxed training with a memory-efficient curriculum learning strategy, the method significantly enhances both generation quality and training efficiency. Unlike existing uniform-state discrete diffusion models—whose performance saturates with increased sampling steps due to reliance on ancestral samplers—the proposed framework achieves continuously improving sample quality as the number of steps grows. Empirical results demonstrate competitive perplexity on OpenWebText and LM1B, while on CIFAR10 it outperforms ancestral sampling in terms of FID and Inception Score. Moreover, the approach reduces training time by 25% and cuts memory consumption by 33%.

Technology Category

Application Category

📝 Abstract
Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the inevitable future of diffusion-based language modeling. Beyond sampling, we develop a memory-efficient curriculum for the Gaussian relaxation training phase, reducing training time by 25% and memory by 33% compared to Duo while maintaining comparable perplexity on OpenWebText and LM1B and strong downstream performance. We release code, checkpoints, and a video-tutorial on: https://s-sahoo.com/duo-ch2
Problem

Research questions and friction points this paper is trying to address.

discrete diffusion
sampling quality
ancestral samplers
training efficiency
uniform-state diffusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Predictor-Corrector samplers
discrete diffusion models
uniform-state diffusion
efficient curriculum training
Ψ-Samplers
🔎 Similar Papers
No similar papers found.