Continuous Latent Diffusion Language Model

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Existing text generation approaches struggle to simultaneously achieve efficient generation, scalable representations, and global semantic modeling. This work proposes Cola DLM, a hierarchical continuous latent diffusion language model that first establishes a stable mapping from text to a latent space via a Text VAE, then models global semantic priors using a block-causal DiT architecture, and finally performs non-autoregressive conditional decoding. By framing the diffusion process as prior transport in latent space, the method decouples global semantic organization from local textual realization, introducing a flexible non-autoregressive inductive bias and enabling unified modeling of both discrete text and continuous modalities. Evaluated across eight benchmarks with up to 2B-parameter baselines and scaling experiments reaching 2000 EFLOPs, Cola DLM demonstrates superior generation quality and strong scalability, offering an effective alternative to conventional token-level language modeling.

📝 Abstract

Large language models have achieved remarkable success under the autoregressive paradigm, yet high-quality text generation need not be tied to a fixed left-to-right order. Existing alternatives still struggle to jointly achieve generation efficiency, scalable representation learning, and effective global semantic modeling. We propose Cola DLM, a hierarchical latent diffusion language model that frames text generation through hierarchical information decomposition. Cola DLM first learns a stable text-to-latent mapping with a Text VAE, then models a global semantic prior in continuous latent space with a block-causal DiT, and finally generates text through conditional decoding. From a unified Markov-path perspective, its diffusion process performs latent prior transport rather than token-level observation recovery, thereby separating global semantic organization from local textual realization. This design yields a more flexible non-autoregressive inductive bias, supports semantic compression and prior fitting in continuous space, and naturally extends to other continuous modalities. Through experiments spanning 4 research questions, 8 benchmarks, strictly matched ~2B-parameter autoregressive and LLaDA baselines, and scaling curves up to about 2000 EFLOPs, we identify an effective overall configuration of Cola DLM and verify its strong scaling behavior for text generation. Taken together, the results establish hierarchical continuous latent prior modeling as a principled alternative to strictly token-level language modeling, where generation quality and scaling behavior may better reflect model capability than likelihood, while also suggesting a concrete path toward unified modeling across discrete text and continuous modalities.

Problem

Research questions and friction points this paper is trying to address.

non-autoregressive generation

global semantic modeling

latent diffusion

text generation

representation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Diffusion

Non-autoregressive Generation

Hierarchical Representation