Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models

๐Ÿ“… 2026-02-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge in evaluating discrete diffusion language models, where existing metrics conflate denoiser approximation error with distributional bias introduced by the sampler, thereby obscuring true sampling correctness. To disentangle these factors, the authors propose a sampler-centric evaluation framework that leverages an oracle denoiser derived from the exact posterior of a hidden Markov model (HMM) grounded in a real Markov chain, enabling precise isolation of distributional errors arising solely from the sampling process. Experiments reveal that even under ideal denoising conditions, few-step samplers exhibit substantial distributional inconsistencyโ€”a flaw largely undetected by conventional generation metrics such as negative log-likelihood, perplexity, or MAUVE. The study demonstrates that current few-step discrete diffusion samplers are distributionally incorrect, with errors vanishing only when the number of sampling steps approaches the sequence length.

Technology Category

Application Category

๐Ÿ“ Abstract
Discrete diffusion language models (dLLMs) provide a fast and flexible alternative to autoregressive models (ARMs) via iterative denoising with parallel updates. However, their evaluation is challenging: existing metrics conflate denoiser approximation error with sampler-induced error from the sampling dynamics, a problem that does not arise for ARMs whose autoregressive sampling exactly reflects the learned probability model. We introduce a sampler-centric oracle framework that replaces learned denoisers with an exact Hidden Markov Model posterior derived from a ground-truth Markov chain, isolating sampler-induced error in a controlled setting. We show that few-step discrete diffusion samplers are not distributionally correct even under an oracle denoiser, with transition-level mismatch that vanishes only as the number of steps approaches the sequence length. Moreover, improvements in negative log-likelihood, generative perplexity, or MAUVE do not imply correct sampling. Code is available at https://luhantang.github.io/dllm_sampler
Problem

Research questions and friction points this paper is trying to address.

discrete diffusion language models
sampler evaluation
sampling error
distributional correctness
denoiser approximation
Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete diffusion language models
sampler-centric evaluation
oracle denoiser
sampling correctness
Hidden Markov Model posterior
๐Ÿ”Ž Similar Papers
No similar papers found.