Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis

📅 2026-01-25

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the trade-off between small local chemical language models—prone to hallucination and limited knowledge—and large cloud-based models, which incur high costs and pose privacy risks. We propose ChemCRAFT, a framework that decouples chemical reasoning from knowledge storage via agent-based reinforcement learning, enabling compact local models to accurately retrieve information through a sandboxed suite of chemical tools. To support this approach, we introduce ChemToolDataset, the first large-scale dataset of chemical tool-calling trajectories, and design a SMILES-guided reward mechanism (SMILES-GRPO) with dense rewards. Our results demonstrate that scientific reasoning capabilities can be acquired through learned tool orchestration strategies rather than model scale alone. ChemCRAFT outperforms leading cloud-based large models in molecular analysis, optimization, and retrosynthesis prediction, achieving high performance with low cost and strong privacy guarantees for AI-assisted chemical research.

Technology Category

Application Category

📝 Abstract

Language models are revolutionizing the biochemistry domain, assisting scientists in drug design and chemical synthesis with high efficiency. Yet current approaches struggle between small language models prone to hallucination and limited knowledge retention, and large cloud-based language models plagued by privacy risks and high inference costs. To bridge this gap, we introduce ChemCRAFT, a novel framework leveraging agentic reinforcement learning to decouple chemical reasoning from knowledge storage. Instead of forcing the model to memorize vast chemical data, our approach empowers the language model to interact with a sandbox for precise information retrieval. This externalization of knowledge allows a locally deployable small model to achieve superior performance with minimal inference costs. To enable small language models for agent-calling ability, we build an agentic trajectory construction pipeline and a comprehensive chemical-agent sandbox. Based on sandbox interactions, we constructed ChemToolDataset, the first large-scale chemical tool trajectory dataset. Simultaneously, we propose SMILES-GRPO to build a dense chemical reward function, promoting the model's ability to call chemical agents. Evaluations across diverse aspects of drug design show that ChemCRAFT outperforms current cloud-based LLMs in molecular structure analysis, molecular optimization, and synthesis pathway prediction, demonstrating that scientific reasoning is not solely an emergent ability of model scale, but a learnable policy of tool orchestration. This work establishes a cost-effective and privacy-preserving paradigm for AI-aided chemistry, opening new avenues for accelerating molecular discovery with locally deployable agents. Code available at https://github.com/HowardLi1984/ChemCraft.

Problem

Research questions and friction points this paper is trying to address.

molecular design

chemical language models

hallucination

privacy risk

inference cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic reinforcement learning

chemical language models

tool-augmented reasoning