🤖 AI Summary
Large language models (LLMs) suffer from cascading failures in multi-step chemical reasoning due to minor errors and struggle with precise chemical formula handling and tight LLM-code integration.
Method: We propose a knowledge-augmented reasoning framework centered on a dynamically self-updating knowledge base. It introduces three novel hierarchical memory types—factual, procedural, and strategic—and integrates task decomposition, structured subtask library construction, memory retrieval and refinement, and synergistic LLM-code execution.
Contribution/Results: Evaluated on the four SciBench chemical benchmarks, our framework boosts GPT-4’s performance by up to 46% over baseline methods, substantially outperforming prior approaches. To our knowledge, this is the first work enabling experience-driven, continual evolution of chemical reasoning capability. The framework establishes a scalable, robust AI inference infrastructure for applications in drug discovery and materials design.
📝 Abstract
Chemical reasoning usually involves complex, multi-step processes that demand precise calculations, where even minor errors can lead to cascading failures. Furthermore, large language models (LLMs) encounter difficulties handling domain-specific formulas, executing reasoning steps accurately, and integrating code effectively when tackling chemical reasoning tasks. To address these challenges, we present ChemAgent, a novel framework designed to improve the performance of LLMs through a dynamic, self-updating library. This library is developed by decomposing chemical tasks into sub-tasks and compiling these sub-tasks into a structured collection that can be referenced for future queries. Then, when presented with a new problem, ChemAgent retrieves and refines pertinent information from the library, which we call memory, facilitating effective task decomposition and the generation of solutions. Our method designs three types of memory and a library-enhanced reasoning component, enabling LLMs to improve over time through experience. Experimental results on four chemical reasoning datasets from SciBench demonstrate that ChemAgent achieves performance gains of up to 46% (GPT-4), significantly outperforming existing methods. Our findings suggest substantial potential for future applications, including tasks such as drug discovery and materials science. Our code can be found at https://github.com/gersteinlab/chemagent