Molecular Generative Adversarial Network with Multi-Property Optimization

📅 2024-03-29

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

220K/year

🤖 AI Summary

To address training instability, high computational cost, and poor scalability to large-scale chemical libraries in Monte Carlo Tree Search (MCTS)-based GAN approaches for molecular generation in drug discovery, this paper proposes InstGAN. The framework integrates token-level molecular representation, generative adversarial networks (GANs), and actor-critic reinforcement learning. It introduces a novel “instantaneous + global” dual-scale reward mechanism and incorporates a maximum-entropy policy to mitigate mode collapse. Optimizing multiple properties jointly—including drug-likeness, aqueous solubility, and target binding affinity—InstGAN achieves significant improvements over state-of-the-art GAN and RL baselines on ZINC and MOSES benchmarks. It matches or exceeds current SOTA methods in diversity, validity, and drug-likeness while substantially improving inference efficiency and enabling scalable deployment over billion-compound libraries.

Technology Category

Application Category

📝 Abstract

Deep generative models, such as generative adversarial networks (GANs), have been employed for $de~novo$ molecular generation in drug discovery. Most prior studies have utilized reinforcement learning (RL) algorithms, particularly Monte Carlo tree search (MCTS), to handle the discrete nature of molecular representations in GANs. However, due to the inherent instability in training GANs and RL models, along with the high computational cost associated with MCTS sampling, MCTS RL-based GANs struggle to scale to large chemical databases. To tackle these challenges, this study introduces a novel GAN based on actor-critic RL with instant and global rewards, called InstGAN, to generate molecules at the token-level with multi-property optimization. Furthermore, maximized information entropy is leveraged to alleviate the mode collapse. The experimental results demonstrate that InstGAN outperforms other baselines, achieves comparable performance to state-of-the-art models, and efficiently generates molecules with multi-property optimization. The source code will be released upon acceptance of the paper.

Problem

Research questions and friction points this paper is trying to address.

Overcoming instability in GAN and RL training for molecular generation

Reducing high computational costs of MCTS sampling in drug discovery

Achieving multi-property optimization in token-level molecule generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Actor-critic RL with instant rewards

Token-level molecular generation optimization

Maximized information entropy prevents mode collapse

🔎 Similar Papers

Genetic-guided GFlowNets for Sample Efficient Molecular Optimization