🤖 AI Summary
This work proposes the first continuous diffusion language model that matches the performance of state-of-the-art discrete diffusion models, overcoming the longstanding gap in language modeling capabilities. The approach integrates diffusion in the embedding space with Bregman divergence–based flow matching and introduces three key technical innovations: an ODE-based lower bound for log-likelihood evaluation, a Gumbel distribution–inspired uniform information noise scheduler, and a training protocol incorporating self-conditioning. Evaluated on LM1B and OpenWebText, the model achieves perplexities of 30.0 and 24.6, respectively—on par with the best discrete diffusion models of comparable scale—and demonstrates superior zero-shot transfer performance across multiple benchmarks compared to autoregressive baselines.
📝 Abstract
Continuous diffusion models have achieved strong performance across domains such as images. However, in language modeling, prior continuous diffusion language models (DLMs) lag behind discrete counterparts. In this work, we close this gap with LangFlow, the first continuous DLM to rival discrete diffusion. Our approach connects embedding-space DLMs to Flow Matching via Bregman divergence and introduces three key innovations: (1) a novel ODE-based NLL bound for principled evaluation of continuous flow-based language models; (2) an information-uniform principle for noise scheduling, motivating a learnable scheduler based on a Gumbel distribution; and (3) an improved training protocol incorporating self-conditioning, which enhances both likelihood and sample quality.LangFlow achieves strong performance across benchmarks, reaching a perplexity (PPL) of 30.0 on LM1B and 24.6 on OpenWebText. It matches top discrete DLMs at comparable scale and surpasses autoregressive baselines in zero-shot transfer across multiple benchmarks. LangFlow provides clear evidence that continuous diffusion is a competitive and promising paradigm for language modeling.
https://github.com/nealchen2003/LangFlow