🤖 AI Summary
In de novo drug design, inefficient chemical space exploration, high evaluation costs (e.g., physical simulations or human feedback), and mode collapse in reinforcement learning (RL) remain critical challenges. To address these, this paper proposes a diversity-driven RL framework. Its core innovation is the first integration of Determinantal Point Processes (DPPs) into RL minibatch selection, explicitly modeling molecular pairwise similarities to prioritize structurally diverse and high-quality molecule subsets during policy updates. The method is rigorously evaluated across multiple molecular generation oracles. Results show significant improvements in generated molecule diversity—measured by FCD and SNN scores (+12–28% over baselines)—while preserving drug-likeness (unchanged QED and SA scores) and target activity. This work establishes a new paradigm for efficient, robust molecular generation under high-cost evaluation settings.
📝 Abstract
In many real-world applications, evaluating the goodness of instances is often costly and time-consuming, e.g., human feedback and physics simulations, in contrast to proposing new instances. In particular, this is even more critical in reinforcement learning, as new interactions with the environment (i.e., new instances) need to be evaluated to provide a reward signal to learn from. As sufficient exploration is crucial, learning from a diverse mini-batch can have a large impact and help mitigate mode collapse. In this paper, we introduce diverse mini-batch selection for reinforcement learning and propose to use determinantal point processes for this task. We study this framework in the context of a real-world problem, namely drug discovery. We experimentally study how our proposed framework can improve the effectiveness of chemical exploration in de novo drug design, where finding diverse and high-quality solutions is essential. We conduct a comprehensive evaluation with three well-established molecular generation oracles over numerous generative steps. Our experiments conclude that our diverse mini-batch selection framework can substantially improve the diversity of the solutions, while still obtaining solutions of high quality. In drug discovery, such outcome can potentially lead to fulfilling unmet medication needs faster.