Constrained Code Generation with Discrete Diffusion

📅 2026-05-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
Existing code generation approaches struggle to effectively incorporate program-level constraints such as functionality and security, often producing outputs that violate critical specifications. This work proposes CDC, a training-free neurosymbolic reasoning framework that, for the first time, treats the global program state at each step of a discrete diffusion model as an intervention point. During denoising, CDC integrates program analysis, mathematical optimization, and constraint-aware operators to locally adjust the generation trajectory so that it adheres to prescribed constraints. Without requiring any retraining, CDC efficiently incorporates syntactic, functional, and security constraints. Experiments demonstrate that CDC substantially improves constraint satisfaction rates across multiple benchmarks, outperforming current autoregressive and discrete diffusion methods while incurring lower computational overhead and enabling more localized edits.
📝 Abstract
Discrete diffusion models are a powerful, emerging paradigm for code generation. They construct programs through iterative refinement of partially corrupted token sequences and enable parallel token refinement. Importantly, this paradigm exposes a global program state at each denoising step, which provides a natural intervention point for enforcing program-level functionality and security constraints, guiding the generation before the final code is committed. Building on this observation, the paper introduces Constrained Diffusion for Code (CDC), a training-free neurosymbolic inference framework that integrates constraint satisfaction directly into the reverse denoising process. CDC augments the base discrete diffusion sampler with constraint-aware denoising operators that combine mathematical optimization with program analysis to identify constraint-relevant regions of the intermediate program state and locally adjust the denoising trajectory, steering generation toward feasible programs while remaining close to the base model. Across code generation benchmarks, CDC consistently improves constraint satisfaction in functional correctness, security, and even syntax, outperforming discrete diffusion and autoregressive baselines with less corrective computation and more localized edits.
Problem

Research questions and friction points this paper is trying to address.

constrained code generation
discrete diffusion
program constraints
code correctness
code security
Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete diffusion
constrained code generation
neurosymbolic inference
constraint-aware denoising
program synthesis