The Diffusion Duality

📅 2025-06-12
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
Uniform-state discrete diffusion models (UDMs) exhibit inherent self-correction and fast-sampling potential but have long lagged behind autoregressive and masked diffusion models in generation quality. This work first establishes a theoretical duality between UDMs and Gaussian diffusion, enabling principled design of training and inference procedures. We propose a curriculum learning strategy and discrete consistency distillation to accelerate training and enable minimal-step sampling. Further, variance-reduced training and few-step sampling optimization double training speed. Our method achieves zero-shot perplexity surpassing strong autoregressive baselines on 3 of 7 standard benchmarks, while accelerating sampling by two orders of magnitude—generating high-fidelity text in just 1–4 steps. The core contributions lie in (i) a novel theoretical characterization of UDMs via Gaussian diffusion duality, (ii) an efficient training paradigm integrating curriculum learning and discrete consistency distillation, and (iii) an ultra-fast sampling framework enabled by variance reduction and step-count optimization.

Technology Category

Application Category

📝 Abstract
Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and masked diffusion models. In this work, we narrow this performance gap by leveraging a key insight: Uniform-state diffusion processes naturally emerge from an underlying Gaussian diffusion. Our method, Duo, transfers powerful techniques from Gaussian diffusion to improve both training and sampling. First, we introduce a curriculum learning strategy guided by the Gaussian process, doubling training speed by reducing variance. Models trained with curriculum learning surpass autoregressive models in zero-shot perplexity on 3 of 7 benchmarks. Second, we present Discrete Consistency Distillation, which adapts consistency distillation from the continuous to the discrete setting. This algorithm unlocks few-step generation in diffusion language models by accelerating sampling by two orders of magnitude. We provide the code and model checkpoints on the project page: http://s-sahoo.github.io/duo
Problem

Research questions and friction points this paper is trying to address.

Improving uniform-state diffusion models for text generation
Bridging performance gap with autoregressive models
Enabling faster sampling in diffusion language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Gaussian diffusion for uniform-state processes
Uses curriculum learning to double training speed
Applies Discrete Consistency Distillation for faster sampling
🔎 Similar Papers
No similar papers found.