Information-Theoretic Discrete Diffusion

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of rigorous theoretical foundations for likelihood estimation in discrete diffusion models. We establish the first information-theoretic unifying framework that formally links score-matching losses—discrete score matching (DSE) and discrete classification (DCE)—to log-likelihood estimation. Specifically, we introduce the mutual information–minimum denoising score entropy (I-MDSE) and mutual information–minimum denoising cross-entropy (I-MDCE) principles, proving for the first time in the discrete setting that standard loss functions yield tight, unbiased estimators of log-likelihood. Our framework supports time-integral decomposition, time-free modeling, conditional likelihood estimation, and coupled Monte Carlo estimation of likelihood ratios. We design a differentiable training pipeline and empirically validate its high accuracy and low variance on both synthetic and real-world datasets, achieving significant improvements in likelihood estimation and conditional generation performance. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses. Inspired by the I-MMSE identity for the Gaussian setup, we derive analogous results for the discrete setting. Specifically, we introduce the Information-Minimum Denoising Score Entropy (I-MDSE) relation, which links mutual information between data and its diffused version to the minimum denoising score entropy (DSE) loss. We extend this theory to masked diffusion and establish the Information-Minimum Denoising Cross-Entropy (I-MDCE) relation, connecting cross-entropy losses to mutual information in discrete masked processes. These results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses, showing that commonly used losses such as DSE and DCE are not merely variational bounds but tight and principled estimators of log-likelihood. The I-MDCE decomposition further enables practical extensions, including time-free formula, conditional likelihood estimation in prompt-response tasks, and coupled Monte Carlo estimation of likelihood ratios. Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators. The code is publicly available at https://github.com/Dongjae0324/infodis.
Problem

Research questions and friction points this paper is trying to address.

Developing information-theoretic framework for discrete diffusion models
Establishing principled estimators of log-likelihood using score-matching losses
Extending theory to masked diffusion and enabling practical applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops information-theoretic discrete diffusion framework
Introduces I-MDSE relation linking mutual information to score entropy
Establishes I-MDCE relation connecting cross-entropy to mutual information
🔎 Similar Papers
No similar papers found.