Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work addresses the challenge in evaluating discrete diffusion language models, where existing metrics conflate denoiser approximation error with distributional bias introduced by the sampler, thereby obscuring true sampling correctness. To disentangle these factors, the authors propose a sampler-centric evaluation framework that leverages an oracle denoiser derived from the exact posterior of a hidden Markov model (HMM) grounded in a real Markov chain, enabling precise isolation of distributional errors arising solely from the sampling process. Experiments reveal that even under ideal denoising conditions, few-step samplers exhibit substantial distributional inconsistency—a flaw largely undetected by conventional generation metrics such as negative log-likelihood, perplexity, or MAUVE. The study demonstrates that current few-step discrete diffusion samplers are distributionally incorrect, with errors vanishing only when the number of sampling steps approaches the sequence length.

Technology Category

Application Category

📝 Abstract

Discrete diffusion language models (dLLMs) provide a fast and flexible alternative to autoregressive models (ARMs) via iterative denoising with parallel updates. However, their evaluation is challenging: existing metrics conflate denoiser approximation error with sampler-induced error from the sampling dynamics, a problem that does not arise for ARMs whose autoregressive sampling exactly reflects the learned probability model. We introduce a sampler-centric oracle framework that replaces learned denoisers with an exact Hidden Markov Model posterior derived from a ground-truth Markov chain, isolating sampler-induced error in a controlled setting. We show that few-step discrete diffusion samplers are not distributionally correct even under an oracle denoiser, with transition-level mismatch that vanishes only as the number of steps approaches the sequence length. Moreover, improvements in negative log-likelihood, generative perplexity, or MAUVE do not imply correct sampling. Code is available at https://luhantang.github.io/dllm_sampler

Problem

Research questions and friction points this paper is trying to address.

discrete diffusion language models

sampler evaluation

sampling error

distributional correctness

denoiser approximation

Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete diffusion language models

sampler-centric evaluation

oracle denoiser