Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis

📅 2026-01-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the trade-off between small local chemical language models—prone to hallucination and limited knowledge—and large cloud-based models, which incur high costs and pose privacy risks. We propose ChemCRAFT, a framework that decouples chemical reasoning from knowledge storage via agent-based reinforcement learning, enabling compact local models to accurately retrieve information through a sandboxed suite of chemical tools. To support this approach, we introduce ChemToolDataset, the first large-scale dataset of chemical tool-calling trajectories, and design a SMILES-guided reward mechanism (SMILES-GRPO) with dense rewards. Our results demonstrate that scientific reasoning capabilities can be acquired through learned tool orchestration strategies rather than model scale alone. ChemCRAFT outperforms leading cloud-based large models in molecular analysis, optimization, and retrosynthesis prediction, achieving high performance with low cost and strong privacy guarantees for AI-assisted chemical research.

Technology Category

Application Category

📝 Abstract
Language models are revolutionizing the biochemistry domain, assisting scientists in drug design and chemical synthesis with high efficiency. Yet current approaches struggle between small language models prone to hallucination and limited knowledge retention, and large cloud-based language models plagued by privacy risks and high inference costs. To bridge this gap, we introduce ChemCRAFT, a novel framework leveraging agentic reinforcement learning to decouple chemical reasoning from knowledge storage. Instead of forcing the model to memorize vast chemical data, our approach empowers the language model to interact with a sandbox for precise information retrieval. This externalization of knowledge allows a locally deployable small model to achieve superior performance with minimal inference costs. To enable small language models for agent-calling ability, we build an agentic trajectory construction pipeline and a comprehensive chemical-agent sandbox. Based on sandbox interactions, we constructed ChemToolDataset, the first large-scale chemical tool trajectory dataset. Simultaneously, we propose SMILES-GRPO to build a dense chemical reward function, promoting the model's ability to call chemical agents. Evaluations across diverse aspects of drug design show that ChemCRAFT outperforms current cloud-based LLMs in molecular structure analysis, molecular optimization, and synthesis pathway prediction, demonstrating that scientific reasoning is not solely an emergent ability of model scale, but a learnable policy of tool orchestration. This work establishes a cost-effective and privacy-preserving paradigm for AI-aided chemistry, opening new avenues for accelerating molecular discovery with locally deployable agents. Code available at https://github.com/HowardLi1984/ChemCraft.
Problem

Research questions and friction points this paper is trying to address.

molecular design
chemical language models
hallucination
privacy risk
inference cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic reinforcement learning
chemical language models
tool-augmented reasoning
small language models
molecular design
Hao Li
Hao Li
Harbin Institute of Technology, Shenzhen
Embodied AI
H
He Cao
International Digital Economy Academy (IDEA)
S
Shenyao Peng
School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
Z
Zijing Liu
International Digital Economy Academy (IDEA)
B
Bing Feng
International Digital Economy Academy (IDEA)
Y
Yu Wang
School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
Zhiyuan Yan
Zhiyuan Yan
PhD student @ PKU
MultimodalAIGC DetectionAIGCAI4Science
Y
Yonghong Tian
School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
Yu Li
Yu Li
IDEA
Computer VisionComputational PhotographyGenerative AI
Li Yuan
Li Yuan
Research Associate, University of Science & Technology of China (USTC)
Antibiotic resistanceWastewater treatmentEnvironmental bioremediationAnaerobic digestionFate of organic pollutants