Efficient Evolutionary Search Over Chemical Space with Large Language Models

📅 2024-06-23
🏛️ arXiv.org
📈 Citations: 9
Influential: 1
📄 PDF

career value

216K/year
🤖 AI Summary
In molecular discovery, non-differentiable black-box optimization suffers from expensive objective evaluations and low search efficiency. To address this, we propose LLM-EA—a novel framework that integrates chemistry-aware large language models (e.g., ChemBERTa, fine-tuned GPT) deeply into evolutionary algorithms (EAs), replacing conventional stochastic mutation and crossover. The LLM generates chemically valid, syntactically correct SMILES variants grounded in chemical priors, while a surrogate model and multi-objective optimization (NSGA-II or MOEA/D) jointly guide directed search. This design preserves population diversity while substantially improving chemical validity and convergence directionality. Experiments demonstrate that LLM-EA consistently outperforms traditional EAs and reinforcement learning baselines across property optimization, molecular rediscovery, and structure-guided design tasks. It reduces the number of objective evaluations by 52–74%, accelerates convergence by 2.1–3.8×, and improves optimal solution quality by 12–27%.

Technology Category

Application Category

📝 Abstract
Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations. In this work, we ameliorate this shortcoming by incorporating chemistry-aware Large Language Models (LLMs) into EAs. Namely, we redesign crossover and mutation operations in EAs using LLMs trained on large corpora of chemical information. We perform extensive empirical studies on both commercial and open-source models on multiple tasks involving property optimization, molecular rediscovery, and structure-based drug design, demonstrating that the joint usage of LLMs with EAs yields superior performance over all baseline models across single- and multi-objective settings. We demonstrate that our algorithm improves both the quality of the final solution and convergence speed, thereby reducing the number of required objective evaluations. Our code is available at http://github.com/zoom-wang112358/MOLLEO
Problem

Research questions and friction points this paper is trying to address.

Optimizes molecular discovery using evolutionary algorithms and LLMs.
Reduces computational cost by improving crossover and mutation operations.
Enhances solution quality and convergence speed in chemical space.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrate chemistry-aware LLMs into Evolutionary Algorithms
Redesign EA operations using LLMs trained on chemical data
Enhance solution quality and convergence speed significantly
💼 Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge