ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery

πŸ“… 2026-05-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

211K/year
πŸ€– AI Summary
This work proposes ToolMol, a novel framework that integrates multi-objective genetic algorithms with tool-augmented large language model (LLM) agents to overcome the limitations of existing LLM-driven molecular generation methods, which often produce invalid or low-quality ligands due to syntactic constraints in molecular string representations. By leveraging precise chemical operations from the RDKit toolkit and guided by chain-of-thought reasoning, ToolMol iteratively optimizes ligand populations for efficient and synthesizable de novo small-molecule design. Experimental results across three protein targets demonstrate that ToolMol generates candidates with over 10% improvement in binding affinity and achieves absolute binding free energy scores more than 35% better than current state-of-the-art methods, while maintaining high validity and structural diversity.
πŸ“ Abstract
Advances in large language models (LLMs) have recently opened new and promising avenues for small-molecule drug discovery. Yet existing LLM-based approaches for molecular generation often suffer from high rates of invalid and low-quality ligand candidates, a result of the syntactic limitations of current models with regard to molecular strings. In this paper, we introduce $\texttt{ToolMol}$, an evolutionary agentic framework for de novo drug design. $\texttt{ToolMol}$ combines a multi-objective genetic algorithm with an agentic LLM operator that iteratively updates the ligand population. We build a comprehensive toolbox of RDKit-backed functions that allows our agentic operator to consisently make precise ligand modifications. $\texttt{ToolMol}$ achieves state-of-the-art performance on multi-objective property optimization tasks, discovering drug-like and synthesizable ligands that have $>10\%$ stronger predicted binding affinity compared to existing methods, evaluated on three protein targets. $\texttt{ToolMol}$ ligands additionally achieve state-of-the-art results in gold-standard Absolute Binding Free Energy scores, gaining over existing methods by over $35\%$. By studying chain-of-thought reasoning traces, we observe that tool-calling enables the model to more faithfully execute its planned modifications, efficiently exploiting the strong chemical prior knowledge in LLMs.
Problem

Research questions and friction points this paper is trying to address.

molecular generation
invalid ligands
low-quality candidates
syntactic limitations
drug discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

evolutionary agentic framework
multi-objective drug discovery
LLM-based molecular design
tool-augmented reasoning
binding affinity optimization
πŸ’Ό Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge
A
Andrew Y. Zhou
Department of Computer Science and Engineering, UC San Diego, La Jolla, California, United States
Sharvaree Vadgama
Sharvaree Vadgama
University of Amsterdam
Generative ModelsGeometric deep learningMachine Learning
S
Sumanth Varambally
Department of Computer Science and Engineering, UC San Diego, La Jolla, California, United States
P
Peter Eckmann
Department of Computer Science, Stanford University, Stanford, California
M
Michael K. Gilson
Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, California, United States
Rose Yu
Rose Yu
Associate Professor, University of California, San Diego
Machine LearningComputational Sustainability