Masked Diffusion Models as Energy Minimization

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work uncovers an intrinsic connection between masked diffusion models (MDMs) and the energy minimization problem in discrete optimal transport, establishing for the first time a unified theoretical framework for MDMs under three distinct energy metrics: kinetic energy, conditional kinetic energy, and geodesic energy. We propose a Beta-distribution-based interpolation scheduling parameterization that reduces complex scheduling design to an efficient two-dimensional search, enabling sampling optimization without model retraining. We theoretically prove the equivalence of these three energies under the optimal masking schedule and derive a closed-form solution for the optimal schedule. Experiments demonstrate that our energy-driven scheduling significantly outperforms handcrafted schedules on both synthetic and real-world data—particularly in low-step sampling (≤20 steps)—achieving superior trade-offs between computational efficiency and generation quality.

Technology Category

Application Category

📝 Abstract
We present a systematic theoretical framework that interprets masked diffusion models (MDMs) as solutions to energy minimization problems in discrete optimal transport. Specifically, we prove that three distinct energy formulations--kinetic, conditional kinetic, and geodesic energy--are mathematically equivalent under the structure of MDMs, and that MDMs minimize all three when the mask schedule satisfies a closed-form optimality condition. This unification not only clarifies the theoretical foundations of MDMs, but also motivates practical improvements in sampling. By parameterizing interpolation schedules via Beta distributions, we reduce the schedule design space to a tractable 2D search, enabling efficient post-training tuning without model modification. Experiments on synthetic and real-world benchmarks demonstrate that our energy-inspired schedules outperform hand-crafted baselines, particularly in low-step sampling settings.
Problem

Research questions and friction points this paper is trying to address.

Unify masked diffusion models with energy minimization in optimal transport
Prove equivalence of kinetic, conditional, and geodesic energy formulations
Develop efficient parameterization for optimal mask schedule design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies MDMs with energy minimization theory
Parameterizes schedules via Beta distributions
Enables efficient post-training schedule tuning
🔎 Similar Papers
No similar papers found.