Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

151K/year

🤖 AI Summary

Existing discrete diffusion models fail to fully exploit the decoupled structure between jump times and jump directions in the reverse process of continuous-time Markov chains (CTMCs). This work proposes explicitly decomposing the reverse process into an exit rate, which governs jump timing, and a jump distribution, which determines jump direction, modeling each component with a dedicated head of a dual-headed neural network. This formulation aligns the training objective directly with the KL divergence over path space. Theoretically, the resulting evidence lower bound (ELBO) decomposes into a sum of two independent KL divergences, enabling compatibility with diverse noise scheduling strategies. On OpenWebText, our method becomes the first purely uniform forward-process model to surpass mask-based baselines, and we release pretrained weights to facilitate reproducibility.

Technology Category

Application Category

📝 Abstract

Discrete diffusion models based on continuous-time Markov chains (CTMCs) have shown strong performance on language and discrete data generation, yet existing approaches typically parameterize the reverse rate matrix as a single object -- via concrete scores, clean-data predictions ($x_0$-parameterization), or denoising distributions -- rather than aligning the parameterization with the intrinsic CTMC decomposition into jump timing and jump direction. Since a CTMC is fundamentally a Poisson process fully determined by these two quantities, decomposing along this structure is closer to first principles and naturally leads to our formulation. We propose \textbf{Neural CTMC}, which separately parameterizes the reverse process through an \emph{exit rate} (when to jump) and a \emph{jump distribution} (where to jump) using two dedicated network heads. We show that the evidence lower bound (ELBO) differs from a path-space KL divergence between the true and learned reverse processes by a $θ$-independent constant, so that the training objective is fully governed by the exit rate and jump distribution we parameterize. Moreover, this KL factorizes into a Poisson KL for timing and a categorical KL for direction. We further show that the tractable conditional surrogate preserves the gradients and minimizers of the corresponding marginal reverse-process objective under standard regularity assumptions. Our theoretical framework also covers masked and GIDD-style noise schedules. Empirically, while the uniform forward process has been explored in prior work, our model, to our best of the knowledge, is the first pure-uniform method to outperform mask-based methods on the OpenWebText dataset.To facilitate reproducibility, we release our pretrained weights at https://huggingface.co/Jiangxy1117/Neural-CTMC.

Problem

Research questions and friction points this paper is trying to address.

continuous-time Markov chain

discrete diffusion

jump timing

jump direction

reverse process

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural CTMC

continuous-time Markov chain

discrete diffusion