Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design

📅 2024-10-17

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Discrete diffusion models face a fundamental challenge in biomolecular sequence design: balancing task-specific performance (e.g., protein stability, enhancer activity) with sequence naturalness. To address this, we propose DRAKES—the first end-to-end differentiable framework that integrates reward optimization into discrete diffusion modeling. DRAKES employs Gumbel-Softmax reparameterization over the entire sampling trajectory and enables theoretically guaranteed reward-guided optimization under a KL-divergence constraint. It further ensures convergence via continuous-time Markov chain modeling of the diffusion process. Experiments demonstrate that DRAKES significantly outperforms existing baselines on both protein stability and DNA enhancer activity design tasks. Generated sequences exhibit superior functional efficacy while maintaining high naturalness—measured by likelihood under pretrained language models and structural plausibility. By unifying reward-driven control with principled discrete diffusion, DRAKES establishes a new paradigm for controllable discrete sequence generation in computational biology.

Technology Category

Application Category

📝 Abstract

Recent studies have demonstrated the strong empirical performance of diffusion models on discrete sequences across domains from natural language to biological sequence generation. For example, in the protein inverse folding task, conditional diffusion models have achieved impressive results in generating natural-like sequences that fold back into the original structure. However, practical design tasks often require not only modeling a conditional distribution but also optimizing specific task objectives. For instance, we may prefer protein sequences with high stability. To address this, we consider the scenario where we have pre-trained discrete diffusion models that can generate natural-like sequences, as well as reward models that map sequences to task objectives. We then formulate the reward maximization problem within discrete diffusion models, analogous to reinforcement learning (RL), while minimizing the KL divergence against pretrained diffusion models to preserve naturalness. To solve this RL problem, we propose a novel algorithm, DRAKES, that enables direct backpropagation of rewards through entire trajectories generated by diffusion models, by making the originally non-differentiable trajectories differentiable using the Gumbel-Softmax trick. Our theoretical analysis indicates that our approach can generate sequences that are both natural-like and yield high rewards. While similar tasks have been recently explored in diffusion models for continuous domains, our work addresses unique algorithmic and theoretical challenges specific to discrete diffusion models, which arise from their foundation in continuous-time Markov chains rather than Brownian motion. Finally, we demonstrate the effectiveness of DRAKES in generating DNA and protein sequences that optimize enhancer activity and protein stability, respectively, important tasks for gene therapies and protein-based therapeutics.

Problem

Research questions and friction points this paper is trying to address.

Optimizing discrete diffusion models for specific task objectives.

Enhancing DNA and protein sequences for therapeutic applications.

Balancing naturalness and reward in sequence generation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes discrete diffusion models via reward optimization.

Uses Gumbel-Softmax for differentiable reward backpropagation.

Optimizes DNA and protein sequences for specific biological tasks.

🔎 Similar Papers

Diffusion on language model encodings for protein sequence generation