Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models in Biomolecular Design

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

In biomolecular design, reward functions are often non-differentiable—e.g., derived from physics-based simulations or domain-specific heuristics—posing challenges for reward-guided diffusion model training, including instability, low sample efficiency, and mode collapse. To address this, we propose an iterative policy distillation framework that enables efficient fine-tuning of diffusion models under arbitrary (including non-differentiable) rewards. Our method leverages off-policy data reuse, soft-optimal policy modeling, and KL-divergence-regularized policy updates to stabilize learning and preserve diversity. Compared to existing reinforcement learning–based approaches, it significantly improves training stability and sample efficiency while avoiding mode collapse. We validate the framework across protein, small-molecule, and regulatory DNA design tasks, achieving state-of-the-art reward optimization performance in all settings. Results demonstrate the method’s generality, robustness to diverse reward formulations, and scientific validity in real-world biomolecular design applications.

Technology Category

Application Category

📝 Abstract

We address the problem of fine-tuning diffusion models for reward-guided generation in biomolecular design. While diffusion models have proven highly effective in modeling complex, high-dimensional data distributions, real-world applications often demand more than high-fidelity generation, requiring optimization with respect to potentially non-differentiable reward functions such as physics-based simulation or rewards based on scientific knowledge. Although RL methods have been explored to fine-tune diffusion models for such objectives, they often suffer from instability, low sample efficiency, and mode collapse due to their on-policy nature. In this work, we propose an iterative distillation-based fine-tuning framework that enables diffusion models to optimize for arbitrary reward functions. Our method casts the problem as policy distillation: it collects off-policy data during the roll-in phase, simulates reward-based soft-optimal policies during roll-out, and updates the model by minimizing the KL divergence between the simulated soft-optimal policy and the current model policy. Our off-policy formulation, combined with KL divergence minimization, enhances training stability and sample efficiency compared to existing RL-based methods. Empirical results demonstrate the effectiveness and superior reward optimization of our approach across diverse tasks in protein, small molecule, and regulatory DNA design.

Problem

Research questions and friction points this paper is trying to address.

Fine-tuning diffusion models for reward-guided biomolecular design

Optimizing non-differentiable reward functions in diffusion models

Improving stability and efficiency in reward-based model fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative distillation for reward-guided fine-tuning

Off-policy data collection with KL divergence minimization

Soft-optimal policy simulation for stable optimization

🔎 Similar Papers

No similar papers found.