Non-Markovian Discrete Diffusion with Causal Language Models

πŸ“… 2025-02-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Discrete diffusion models suffer from limited expressiveness and inferior performance compared to autoregressive causal language models (CLMs) in modeling structured sequences. To address this, we propose CaDDiβ€”the first non-Markovian discrete diffusion framework. Methodologically, CaDDi integrates causal masking into the diffusion process, enabling denoising trajectories to inherently capture sequential dependencies; it formally unifies CLMs as a special case of the framework, allowing pretrained large language models (LLMs) to be directly incorporated without architectural modification. The approach combines non-Markovian transition kernels, discrete sequence denoising, and causal-constrained modeling. Experiments demonstrate that CaDDi significantly outperforms existing discrete diffusion models on both natural language and biological sequence tasks, while substantially narrowing the performance gap with state-of-the-art autoregressive LLMs.

Technology Category

Application Category

πŸ“ Abstract
Discrete diffusion models have emerged as a flexible and controllable paradigm for structured sequence modeling, yet they still lag behind causal language models in expressiveness. To bridge the gap between two paradigms, we introduce CaDDi, a causal discrete diffusion model that unifies sequential and temporal modeling within a non-Markovian diffusion framework. Unlike conventional diffusion models that operate step by step with no access to prior states, CaDDi integrates the temporal trajectory, enabling more expressive and controllable generation. Our approach also treats causal language models as a special case, allowing seamless adoption of pretrained large language models (LLMs) for discrete diffusion without the need for architectural modifications. Empirically, we demonstrate that CaDDi outperforms state-of-the-art discrete diffusion models on both natural language and biological sequence tasks, narrowing the gap between diffusion-based methods and large-scale autoregressive transformers.
Problem

Research questions and friction points this paper is trying to address.

Bridges gap between diffusion and causal models
Enhances sequence modeling expressiveness and control
Integrates large language models seamlessly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies sequential and temporal modeling
Integrates temporal trajectory for generation
Adopts pretrained LLMs without modifications
πŸ”Ž Similar Papers
No similar papers found.
Yangtian Zhang
Yangtian Zhang
Yale University
Generative ModelsGraph Representation Learning
S
Sizhuang He
Yale University, New Haven, CT, USA
D
Daniel Levine
Yale University, New Haven, CT, USA
L
Lawrence Zhao
Yale University, New Haven, CT, USA
D
David Zhang
Yale University, New Haven, CT, USA
S
S. Rizvi
Yale University, New Haven, CT, USA
E
E. Zappala
Idaho State University, Pocatello, ID, USA
R
Rex Ying
Yale University, New Haven, CT, USA
David van Dijk
David van Dijk
Assistant Professor, Yale University
machine learningcomputational biology