Leveraging Pretrained Language Models as Energy Functions for Glauber Dynamics Text Diffusion

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
📝 Abstract
We present a discrete diffusion-based language model using Glauber dynamics from statistical physics. Our main insight is that instead of trying to train a discrete state space diffusion model using Glauber dynamics with a uniform transition kernel as the forward process, one can set up an ``energy function'' based on pretrained causal/masked language models. When viewed as the stationary distribution, this energy function allows us to significantly improve the quality of the generated text. Incorporating UL2 as the pretrained model into our diffusion pipeline, we outperform prior diffusion based LMs and perform competitively with autoregressive models of comparable model sizes. Furthermore, our models are competitive with or outperform prior diffusion models and GPT-2 style auto-regressive models on zero-shot common sense reasoning tasks as well as planning and search tasks like Sudoku and Zebra puzzles.
Problem

Research questions and friction points this paper is trying to address.

discrete diffusion
Glauber dynamics
energy function
pretrained language models
text generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Glauber dynamics
energy-based modeling
discrete diffusion
pretrained language models
text generation