MolMem: Memory-Augmented Agentic Reinforcement Learning for Sample-Efficient Molecular Optimization

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the high oracle evaluation cost and low sample efficiency in molecular optimization, particularly under limited budgets where balancing structural similarity and multi-objective optimization remains challenging. The authors propose MolMem, the first multi-round reinforcement learning framework incorporating a long-term memory mechanism. MolMem leverages static exemplar memory to guide cold-start exploration and employs evolutionary skill memory to distill successful trajectories into reusable policies, further enhanced by progressively densified reward signals to improve training efficiency. Evaluated with only 500 oracle calls, MolMem achieves remarkable sample efficiency, attaining success rates of 90% on single-objective tasks—1.5× higher than the best baseline—and 52% on multi-objective tasks.

Technology Category

Application Category

📝 Abstract

In drug discovery, molecular optimization aims to iteratively refine a lead compound to improve molecular properties while preserving structural similarity to the original molecule. However, each oracle evaluation is expensive, making sample efficiency a key challenge for existing methods under a limited oracle budget. Trial-and-error approaches require many oracle calls, while methods that leverage external knowledge tend to reuse familiar templates and struggle on challenging objectives. A key missing piece is long-term memory that can ground decisions and provide reusable insights for future optimizations. To address this, we present MolMem (\textbf{Mol}ecular optimization with \textbf{Mem}ory), a multi-turn agentic reinforcement learning (RL) framework with a dual-memory system. Specifically, MolMem uses Static Exemplar Memory to retrieve relevant exemplars for cold-start grounding, and Evolving Skill Memory to distill successful trajectories into reusable strategies. Built on this memory-augmented formulation, we train the policy with dense step-wise rewards, turning costly rollouts into long-term knowledge that improves future optimization. Extensive experiments show that MolMem achieves 90\% success on single-property tasks (1.5$\times$ over the best baseline) and 52\% on multi-property tasks using only 500 oracle calls. Our code is available at https://github.com/REAL-Lab-NU/MolMem.

Problem

Research questions and friction points this paper is trying to address.

molecular optimization

sample efficiency

oracle budget

drug discovery

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

memory-augmented reinforcement learning

molecular optimization

sample efficiency