🤖 AI Summary
This work addresses key challenges in goal-directed molecular generation—namely, the difficulty of jointly optimizing multiple constraints, balancing conflicting objectives, and maintaining structural validity in a non-differentiable chemical space—by proposing CAGenMol, a condition-aware discrete diffusion framework. The method formulates molecular generation as a conditional denoising process guided by heterogeneous structural and property signals, innovatively integrating non-autoregressive discrete diffusion with reinforcement learning to align with non-differentiable objectives. This approach ensures chemical validity and diversity while enabling iterative refinement during inference. Experimental results demonstrate that CAGenMol consistently outperforms existing methods across benchmarks involving structural, property-based, and dual-condition settings, achieving significant improvements in binding affinity, drug-likeness, and success rate.
📝 Abstract
Goal-directed molecular generation requires satisfying heterogeneous constraints such as protein--ligand compatibility and multi-objective drug-like properties, yet existing methods often optimize these constraints in isolation, failing to reconcile conflicting objectives (e.g., affinity vs. safety), and struggle to navigate the non-differentiable chemical space without compromising structural validity. To address these challenges, we propose CAGenMol, a condition-aware discrete diffusion framework over molecular sequences that formulates molecular design as conditional denoising guided by heterogeneous structural and property signals. By coupling discrete diffusion with reinforcement learning, the model aligns the generation trajectory with non-differentiable objectives while preserving chemical validity and diversity. The non-autoregressive nature of diffusion language model further enables iterative refinement of molecular fragments at inference time. Experiments on structure-conditioned, property-conditioned, and dual-conditioned benchmarks demonstrate consistent improvements over state-of-the-art methods in binding affinity, drug-likeness, and success rate, highlighting the effectiveness of our framework.