Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Continuous diffusion language models possess superior theoretical expressivity but underperform discrete diffusion and recurrent Transformer models in practice due to the inherent difficulty of decoding from continuous latent spaces to discrete tokens. Method: We propose a co-evolutionary diffusion framework that jointly models continuous latent and discrete token spaces within a single architecture, enabling synchronized denoising and cross-space information exchange. Our approach introduces a multimodal joint diffusion mechanism, a co-denoising architecture, and enhanced training and sampling strategies to balance implicit reasoning capability with generation fidelity. Contribution/Results: We theoretically prove that our model’s expressivity strictly dominates both discrete diffusion and recurrent Transformers. Empirically, it achieves significant improvements in generation quality, convergence speed, and training stability across multiple language modeling benchmarks—marking the first work to bridge the gap between theoretical expressivity and practical performance in diffusion-based language modeling.

Technology Category

Application Category

📝 Abstract
Diffusion language models, especially masked discrete diffusion models, have achieved great success recently. While there are some theoretical and primary empirical results showing the advantages of latent reasoning with looped transformers or continuous chain-of-thoughts, continuous diffusion models typically underperform their discrete counterparts. In this paper, we argue that diffusion language models do not necessarily need to be in the discrete space. In particular, we prove that continuous diffusion models have stronger expressivity than discrete diffusions and looped transformers. We attribute the contradiction between the theoretical expressiveness and empirical performance to their practical trainability: while continuous diffusion provides intermediate supervision that looped transformers lack, they introduce additional difficulty decoding tokens into the discrete token space from the continuous representation space. We therefore propose Coevolutionary Continuous Discrete Diffusion (CCDD), which defines a joint multimodal diffusion process on the union of a continuous representation space and a discrete token space, leveraging a single model to simultaneously denoise in the joint space. By combining two modalities, CCDD is expressive with rich semantics in the latent space, as well as good trainability and sample quality with the help of explicit discrete tokens. We also propose effective architectures and advanced training/sampling techniques for CCDD, which reveals strong empirical performance in extensive language modeling experiments on real-world tasks.
Problem

Research questions and friction points this paper is trying to address.

Bridging continuous and discrete spaces in diffusion language models
Enhancing latent reasoning through joint multimodal diffusion processes
Improving trainability and sample quality in continuous diffusion frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint multimodal diffusion in continuous and discrete spaces
Simultaneous denoising using a single model architecture
Enhanced expressivity and trainability with explicit tokens
🔎 Similar Papers
No similar papers found.