MolEvolve: LLM-Guided Evolutionary Search for Interpretable Molecular Optimization

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poor interpretability of deep learning in molecular optimization and its difficulty in handling activity cliffs—abrupt changes in molecular properties induced by minor structural modifications—by proposing a large language model (LLM)-guided autonomous forward-planning framework. The approach formulates molecular discovery as a symbolic evolutionary search problem, leveraging the LLM to automatically generate an executable library of chemical operations and integrating Monte Carlo Tree Search (MCTS) with cheminformatics tools like RDKit for on-the-fly planning. For the first time, it enables the generation of transparent, human-readable chemical reasoning chains without manual feature engineering, effectively capturing the discontinuities in structure–activity relationships. Experimental results demonstrate that the method outperforms existing baselines in molecular property optimization while providing interpretable optimization pathways and mechanistic insights.

Technology Category

Application Category

📝 Abstract
Despite deep learning's success in chemistry, its impact is hindered by a lack of interpretability and an inability to resolve activity cliffs, where minor structural nuances trigger drastic property shifts. Current representation learning, bound by the similarity principle, often fails to capture these structural-activity discontinuities. To address this, we introduce MolEvolve, an evolutionary framework that reformulates molecular discovery as an autonomous, look-ahead planning problem. Unlike traditional methods that depend on human-engineered features or rigid prior knowledge, MolEvolve leverages a Large Language Model (LLM) to actively explore and evolve a library of executable chemical symbolic operations. By utilizing the LLM to cold start and an Monte Carlo Tree Search (MCTS) engine for test-time planning with external tools (e.g. RDKit), the system self-discovers optimal trajectories autonomously. This process evolves transparent reasoning chains that translate complex structural transformations into actionable, human-readable chemical insights. Experimental results demonstrate that MolEvolve's autonomous search not only evolves transparent, human-readable chemical insights, but also outperforms baselines in both property prediction and molecule optimization tasks.
Problem

Research questions and friction points this paper is trying to address.

interpretability
activity cliffs
molecular optimization
representation learning
structure-activity relationship
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-guided evolution
interpretable molecular optimization
Monte Carlo Tree Search
activity cliffs
symbolic chemical operations
🔎 Similar Papers
No similar papers found.
X
Xiangsen Chen
Hong Kong Polytechnic University
R
Ruilong Wu
Hong Kong University of Science and Technology (Guangzhou)
Yanyan Lan
Yanyan Lan
Tsinghua University
Information RetrievalMachine LearningAI4Science
Ting Ma
Ting Ma
Harbin Institute of Technology (Shenzhen)
Computational neuroscienceneuroimagebrain-computer-interfacemedical image analysis
Y
Yang Liu
Hong Kong Polytechnic University