LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work proposes the first continuous diffusion language model that matches the performance of state-of-the-art discrete diffusion models, overcoming the longstanding gap in language modeling capabilities. The approach integrates diffusion in the embedding space with Bregman divergence–based flow matching and introduces three key technical innovations: an ODE-based lower bound for log-likelihood evaluation, a Gumbel distribution–inspired uniform information noise scheduler, and a training protocol incorporating self-conditioning. Evaluated on LM1B and OpenWebText, the model achieves perplexities of 30.0 and 24.6, respectively—on par with the best discrete diffusion models of comparable scale—and demonstrates superior zero-shot transfer performance across multiple benchmarks compared to autoregressive baselines.

Technology Category

Application Category

📝 Abstract

Continuous diffusion models have achieved strong performance across domains such as images. However, in language modeling, prior continuous diffusion language models (DLMs) lag behind discrete counterparts. In this work, we close this gap with LangFlow, the first continuous DLM to rival discrete diffusion. Our approach connects embedding-space DLMs to Flow Matching via Bregman divergence and introduces three key innovations: (1) a novel ODE-based NLL bound for principled evaluation of continuous flow-based language models; (2) an information-uniform principle for noise scheduling, motivating a learnable scheduler based on a Gumbel distribution; and (3) an improved training protocol incorporating self-conditioning, which enhances both likelihood and sample quality.LangFlow achieves strong performance across benchmarks, reaching a perplexity (PPL) of 30.0 on LM1B and 24.6 on OpenWebText. It matches top discrete DLMs at comparable scale and surpasses autoregressive baselines in zero-shot transfer across multiple benchmarks. LangFlow provides clear evidence that continuous diffusion is a competitive and promising paradigm for language modeling. https://github.com/nealchen2003/LangFlow

Problem

Research questions and friction points this paper is trying to address.

continuous diffusion

language modeling

discrete diffusion

diffusion language models

generative modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

continuous diffusion

language modeling

flow matching