CANDI: Hybrid Discrete-Continuous Diffusion Models

πŸ“… 2025-10-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Continuous diffusion models underperform significantly on discrete data (e.g., text, discrete tokens) compared to purely discrete approaches, primarily due to temporal inconsistency arising from discrete identity corruption and continuous rank degeneration. This work introduces CANDIβ€”the first hybrid diffusion framework that decouples discrete token identity from continuous structural noise processes. Through token identifiability analysis, we formally characterize the temporal mismatch mechanism and propose a conditional co-learning architecture to jointly model structural and geometric information. CANDI integrates Gaussian diffusion, score matching, gradient guidance, and discrete token modeling, enabling low-step (few NFE) generation, classifier-guided controllable synthesis, and efficient text generation. Experiments provide the first empirical validation of temporal mismatch; CANDI surpasses mask-based diffusion models under low-NFE regimes and substantially improves generative quality in discrete spaces.

Technology Category

Application Category

πŸ“ Abstract
While continuous diffusion has shown remarkable success in continuous domains such as image generation, its direct application to discrete data has underperformed compared to purely discrete formulations. This gap is counterintuitive, given that continuous diffusion learns score functions that enable joint evolution across multiple positions. To understand this gap, we introduce token identifiability as an analytical framework for understanding how Gaussian noise corrupts discrete data through two mechanisms: discrete identity corruption and continuous rank degradation. We reveal that these mechanisms scale differently with vocabulary size, creating a temporal dissonance: at noise levels where discrete corruption preserves enough structure for conditional learning, continuous denoising is trivial; at noise levels where continuous denoising is meaningful, discrete corruption destroys nearly all conditional structure. To solve this, we propose CANDI (Continuous ANd DIscrete diffusion), a hybrid framework that decouples discrete and continuous corruption, enabling simultaneous learning of both conditional structure and continuous geometry. We empirically validate the temporal dissonance phenomenon and demonstrate that CANDI successfully avoids it. This unlocks the benefits of continuous diffusion for discrete spaces: on controlled generation, CANDI enables classifier-based guidance with off-the-shelf classifiers through simple gradient addition; on text generation, CANDI outperforms masked diffusion at low NFE, demonstrating the value of learning continuous gradients for discrete spaces.
Problem

Research questions and friction points this paper is trying to address.

Understanding why continuous diffusion underperforms on discrete data
Proposing hybrid framework to decouple discrete and continuous corruption
Enabling continuous diffusion benefits for discrete space generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework decouples discrete and continuous corruption
Enables simultaneous learning of conditional structure and geometry
Unlocks continuous diffusion benefits for discrete spaces with gradients
πŸ”Ž Similar Papers
No similar papers found.