Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Standard discrete diffusion models map all unobserved states to a single [MASK] token, causing semantic information loss during denoising—termed an “information void.” To address this, we propose the Continuous-Accompanied Discrete Diffusion (CADD) framework, which introduces an auxiliary diffusion process in a continuous latent space. This enables masked tokens to retain progressive semantic structure and explicitly guides discrete denoising via continuous latent variables. CADD is the first approach to integrate continuous latent modeling into a discrete diffusion backbone, thereby preventing semantic collapse and enabling controllable trade-offs between mode coverage and focus. It preserves compatibility with existing training paradigms and requires no architectural modifications. Evaluated on text generation, image synthesis, and code modeling tasks, CADD consistently outperforms strong baselines, achieving significant improvements in both quantitative metrics (e.g., FID, BLEU, CodeBLEU) and qualitative assessments.

Technology Category

Application Category

📝 Abstract

Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token. This creates an 'information void' where semantic information that could be inferred from unmasked tokens is lost between denoising steps. We introduce Continuously Augmented Discrete Diffusion (CADD), a framework that augments the discrete state space with a paired diffusion in a continuous latent space. This yields graded, gradually corrupted states in which masked tokens are represented by noisy yet informative latent vectors rather than collapsed 'information voids'. At each reverse step, CADD may leverage the continuous latent as a semantic hint to guide discrete denoising. The design is clean and compatible with existing discrete diffusion training. At sampling time, the strength and choice of estimator for the continuous latent vector enables a controlled trade-off between mode-coverage (generating diverse outputs) and mode-seeking (generating contextually precise outputs) behaviors. Empirically, we demonstrate CADD improves generative quality over mask-based diffusion across text generation, image synthesis, and code modeling, with consistent gains on both qualitative and quantitative metrics against strong discrete baselines.

Problem

Research questions and friction points this paper is trying to address.

Addresses information loss in discrete diffusion models

Introduces continuous latent space to guide denoising

Enables controlled trade-off between diversity and precision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augments discrete state space with continuous latent diffusion

Uses noisy latent vectors instead of collapsed information voids

Enables controlled trade-off between diversity and precision

🔎 Similar Papers

No similar papers found.