🤖 AI Summary
Discrete diffusion models face a fundamental challenge in biomolecular sequence design: balancing task-specific performance (e.g., protein stability, enhancer activity) with sequence naturalness. To address this, we propose DRAKES—the first end-to-end differentiable framework that integrates reward optimization into discrete diffusion modeling. DRAKES employs Gumbel-Softmax reparameterization over the entire sampling trajectory and enables theoretically guaranteed reward-guided optimization under a KL-divergence constraint. It further ensures convergence via continuous-time Markov chain modeling of the diffusion process. Experiments demonstrate that DRAKES significantly outperforms existing baselines on both protein stability and DNA enhancer activity design tasks. Generated sequences exhibit superior functional efficacy while maintaining high naturalness—measured by likelihood under pretrained language models and structural plausibility. By unifying reward-driven control with principled discrete diffusion, DRAKES establishes a new paradigm for controllable discrete sequence generation in computational biology.
📝 Abstract
Recent studies have demonstrated the strong empirical performance of diffusion models on discrete sequences across domains from natural language to biological sequence generation. For example, in the protein inverse folding task, conditional diffusion models have achieved impressive results in generating natural-like sequences that fold back into the original structure. However, practical design tasks often require not only modeling a conditional distribution but also optimizing specific task objectives. For instance, we may prefer protein sequences with high stability. To address this, we consider the scenario where we have pre-trained discrete diffusion models that can generate natural-like sequences, as well as reward models that map sequences to task objectives. We then formulate the reward maximization problem within discrete diffusion models, analogous to reinforcement learning (RL), while minimizing the KL divergence against pretrained diffusion models to preserve naturalness. To solve this RL problem, we propose a novel algorithm, DRAKES, that enables direct backpropagation of rewards through entire trajectories generated by diffusion models, by making the originally non-differentiable trajectories differentiable using the Gumbel-Softmax trick. Our theoretical analysis indicates that our approach can generate sequences that are both natural-like and yield high rewards. While similar tasks have been recently explored in diffusion models for continuous domains, our work addresses unique algorithmic and theoretical challenges specific to discrete diffusion models, which arise from their foundation in continuous-time Markov chains rather than Brownian motion. Finally, we demonstrate the effectiveness of DRAKES in generating DNA and protein sequences that optimize enhancer activity and protein stability, respectively, important tasks for gene therapies and protein-based therapeutics.