TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Discrete diffusion language models lack a computable upper bound on the log-likelihood, and existing evaluations rely solely on the evidence lower bound (ELBO), which inadequately captures the true likelihood. This work proposes the Tangent Upper Bound on Evidence (TUBE), a variational upper bound applicable to a broad class of latent variable models—including discrete and masked diffusion models, arbitrary-order autoregressive models, and their block-wise variants—that admits unbiased Monte Carlo estimation. As the first method to provide a computable log-likelihood upper bound for discrete diffusion models, TUBE reveals that block-wise diffusion and block-wise autoregressive models achieve strictly lower log-likelihoods than exact autoregressive baselines, thereby confirming the latter’s superior likelihood performance.
📝 Abstract
Log-likelihood is a standard metric for evaluating generative models. Unfortunately, in contrast to autoregressive models (ARMs), discrete diffusion models generally do not admit exact computation of this quantity. Existing evaluations, therefore, rely on the evidence lower bound (ELBO), leaving unclear how much higher the true value may be. We address this by introducing the Tangent Upper Bound on Evidence (TUBE), a variational upper bound on log-likelihood that admits an unbiased Monte Carlo estimator. Our TUBE extends across latent-variable models, including masked diffusion models (MDMs), any-order ARMs (AO-ARMs), and block variants of both. Applied to block MDMs and block AO-ARMs, TUBE reveals our key empirical finding that these models lie strictly below the exact ARM baseline, showing that ARMs still dominate in likelihood.
Problem

Research questions and friction points this paper is trying to address.

discrete diffusion models
log-likelihood
evidence lower bound
generative models
model evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

TUBE
discrete diffusion models
log-likelihood upper bound
variational inference
masked diffusion models
🔎 Similar Papers
No similar papers found.