🤖 AI Summary
Chemical AI agents frequently suffer from erroneous tool invocation and inefficient multi-tool coordination. To address these challenges, this paper introduces Hierarchical Tool Stacking (HTS), a novel framework that establishes the first domain-specific hierarchical tool orchestration paradigm for chemistry. HTS defines four interpretable stacking behaviors and employs a two-stage LLM optimization architecture—tool self-stacking warm-up followed by multi-layer decision optimization—to enable collaborative reasoning and self-correcting inference across heterogeneous toolchains (e.g., molecular modeling, reaction rule application, and property prediction). Evaluated on four core chemical AI tasks—molecular design, reaction prediction, property estimation, and retrosynthetic planning—HTS consistently outperforms strong baselines including GPT-4o, DeepSeek-R1, and the specialized model ChemDFM. All code and datasets are publicly released to foster reproducibility and community advancement.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable potential in scientific research, particularly in chemistry-related tasks such as molecular design, reaction prediction, and property estimation. While tool-augmented LLMs have been introduced to enhance reasoning and computation in these domains, existing approaches suffer from tool invocation errors and lack effective collaboration among diverse tools, limiting their overall performance. To address these challenges, we propose ChemHTS (Chemical Hierarchical Tool Stacking), a novel method that optimizes tool invocation pathways through a hierarchical stacking strategy. ChemHTS consists of two key stages: tool self-stacking warmup and multi-layer decision optimization, enabling LLMs to refine tool usage dynamically. We evaluate ChemHTS across four classical chemistry tasks and demonstrate its superiority over strong baselines, including GPT-4o, DeepSeek-R1, and chemistry-specific models, including ChemDFM. Furthermore, we define four distinct tool-stacking behaviors to enhance interpretability, providing insights into the effectiveness of tool collaboration. Our dataset and code are publicly available at url{https://github.com/Chang-pw/ChemHTS}.