CANDI: Hybrid Discrete-Continuous Diffusion Models

📅 2025-10-25

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Continuous diffusion models underperform significantly on discrete data (e.g., text, discrete tokens) compared to purely discrete approaches, primarily due to temporal inconsistency arising from discrete identity corruption and continuous rank degeneration. This work introduces CANDI—the first hybrid diffusion framework that decouples discrete token identity from continuous structural noise processes. Through token identifiability analysis, we formally characterize the temporal mismatch mechanism and propose a conditional co-learning architecture to jointly model structural and geometric information. CANDI integrates Gaussian diffusion, score matching, gradient guidance, and discrete token modeling, enabling low-step (few NFE) generation, classifier-guided controllable synthesis, and efficient text generation. Experiments provide the first empirical validation of temporal mismatch; CANDI surpasses mask-based diffusion models under low-NFE regimes and substantially improves generative quality in discrete spaces.

Technology Category

Application Category

📝 Abstract

While continuous diffusion has shown remarkable success in continuous domains such as image generation, its direct application to discrete data has underperformed compared to purely discrete formulations. This gap is counterintuitive, given that continuous diffusion learns score functions that enable joint evolution across multiple positions. To understand this gap, we introduce token identifiability as an analytical framework for understanding how Gaussian noise corrupts discrete data through two mechanisms: discrete identity corruption and continuous rank degradation. We reveal that these mechanisms scale differently with vocabulary size, creating a temporal dissonance: at noise levels where discrete corruption preserves enough structure for conditional learning, continuous denoising is trivial; at noise levels where continuous denoising is meaningful, discrete corruption destroys nearly all conditional structure. To solve this, we propose CANDI (Continuous ANd DIscrete diffusion), a hybrid framework that decouples discrete and continuous corruption, enabling simultaneous learning of both conditional structure and continuous geometry. We empirically validate the temporal dissonance phenomenon and demonstrate that CANDI successfully avoids it. This unlocks the benefits of continuous diffusion for discrete spaces: on controlled generation, CANDI enables classifier-based guidance with off-the-shelf classifiers through simple gradient addition; on text generation, CANDI outperforms masked diffusion at low NFE, demonstrating the value of learning continuous gradients for discrete spaces.

Problem

Research questions and friction points this paper is trying to address.

Understanding why continuous diffusion underperforms on discrete data

Proposing hybrid framework to decouple discrete and continuous corruption

Enabling continuous diffusion benefits for discrete space generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework decouples discrete and continuous corruption

Enables simultaneous learning of conditional structure and geometry

Unlocks continuous diffusion benefits for discrete spaces with gradients

🔎 Similar Papers

Diffusion Models: A Comprehensive Survey of Methods and Applications