MolMem: Memory-Augmented Agentic Reinforcement Learning for Sample-Efficient Molecular Optimization

๐Ÿ“… 2026-04-13
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

219K/year
๐Ÿค– AI Summary
This work addresses the high oracle evaluation cost and low sample efficiency in molecular optimization, particularly under limited budgets where balancing structural similarity and multi-objective optimization remains challenging. The authors propose MolMem, the first multi-round reinforcement learning framework incorporating a long-term memory mechanism. MolMem leverages static exemplar memory to guide cold-start exploration and employs evolutionary skill memory to distill successful trajectories into reusable policies, further enhanced by progressively densified reward signals to improve training efficiency. Evaluated with only 500 oracle calls, MolMem achieves remarkable sample efficiency, attaining success rates of 90% on single-objective tasksโ€”1.5ร— higher than the best baselineโ€”and 52% on multi-objective tasks.

Technology Category

Application Category

๐Ÿ“ Abstract
In drug discovery, molecular optimization aims to iteratively refine a lead compound to improve molecular properties while preserving structural similarity to the original molecule. However, each oracle evaluation is expensive, making sample efficiency a key challenge for existing methods under a limited oracle budget. Trial-and-error approaches require many oracle calls, while methods that leverage external knowledge tend to reuse familiar templates and struggle on challenging objectives. A key missing piece is long-term memory that can ground decisions and provide reusable insights for future optimizations. To address this, we present MolMem (\textbf{Mol}ecular optimization with \textbf{Mem}ory), a multi-turn agentic reinforcement learning (RL) framework with a dual-memory system. Specifically, MolMem uses Static Exemplar Memory to retrieve relevant exemplars for cold-start grounding, and Evolving Skill Memory to distill successful trajectories into reusable strategies. Built on this memory-augmented formulation, we train the policy with dense step-wise rewards, turning costly rollouts into long-term knowledge that improves future optimization. Extensive experiments show that MolMem achieves 90\% success on single-property tasks (1.5$\times$ over the best baseline) and 52\% on multi-property tasks using only 500 oracle calls. Our code is available at https://github.com/REAL-Lab-NU/MolMem.
Problem

Research questions and friction points this paper is trying to address.

molecular optimization
sample efficiency
oracle budget
drug discovery
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

memory-augmented reinforcement learning
molecular optimization
sample efficiency
dual-memory system
agentic RL