Reward-Guided Discrete Diffusion via Clean-Sample Markov Chain for Molecule and Biological Sequence Design

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the limitation of existing discrete diffusion models in molecular and biological sequence design, which rely on noisy intermediate rewards for guidance and consequently suffer from suboptimal generation quality. To overcome this, the authors propose the Clean-Sample Markov Chain (CSMC) sampler, which constructs a Markov chain with the target distribution as its stationary distribution during inference. By leveraging the Metropolis–Hastings algorithm, CSMC enables efficient and reliable reward-guided sampling without depending on unreliable intermediate rewards. The method innovatively designs a proposal distribution that permits exact computation of acceptance probabilities, seamlessly integrating discrete diffusion dynamics, forward–backward processes, and reward guidance. Experimental results demonstrate that CSMC significantly outperforms current approaches across multiple molecular and biological sequence generation tasks, yielding samples with substantially higher reward values.

Technology Category

Application Category

📝 Abstract

Discrete diffusion models have recently emerged as a powerful class of generative models for chemistry and biology data. In these fields, the goal is to generate various samples with high rewards (e.g., drug-likeness in molecules), making reward-based guidance crucial. Most existing methods are based on guiding the diffusion model using intermediate rewards but tend to underperform since intermediate rewards are noisy due to the non-smooth nature of reward functions used in scientific domains. To address this, we propose Clean-Sample Markov Chain (CSMC) Sampler, a method that performs effective test-time reward-guided sampling for discrete diffusion models, enabling local search without relying on intermediate rewards. CSMC constructs a Markov chain of clean samples using the Metropolis-Hastings algorithm such that its stationary distribution is the target distribution. We design a proposal distribution by sequentially applying the forward and backward diffusion processes, making the acceptance probability tractable. Experiments on molecule and biological sequence generation with various reward functions demonstrate that our method consistently outperforms prior approaches that rely on intermediate rewards.

Problem

Research questions and friction points this paper is trying to address.

discrete diffusion

reward-guided generation

molecule design

biological sequence design

noisy intermediate rewards

Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete diffusion

reward-guided generation

Clean-Sample Markov Chain