🤖 AI Summary
High-throughput experimental data acquisition remains costly and narrowly scoped, while vast chemical literature—though rich in reaction information—is hindered by heterogeneous writing styles, complex coreference, and multimodal representations, impeding reliable structural extraction. To address this, we propose the first end-to-end large language model (LLM)-based autonomous agent framework tailored for chemistry, integrating domain-specific knowledge constraints, dynamic prompt generation, iterative reasoning, and self-refinement mechanisms. This framework enables fully automated, high-fidelity extraction of critical reaction conditions—including catalysts, solvents, temperature, and time—from unstructured text. On reaction condition extraction, it achieves expert-level performance (>92% accuracy, recall, and F1-score), reduces inference latency by over 80%, and substantially outperforms existing baseline methods.
📝 Abstract
Chemical synthesis, which is crucial for advancing material synthesis and drug discovery, impacts various sectors including environmental science and healthcare. The rise of technology in chemistry has generated extensive chemical data, challenging researchers to discern patterns and refine synthesis processes. Artificial intelligence (AI) helps by analyzing data to optimize synthesis and increase yields. However, AI faces challenges in processing literature data due to the unstructured format and diverse writing style of chemical literature. To overcome these difficulties, we introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature. This AI agent employs large language models (LLMs) for prompt generation and iterative optimization. It functions as a chemistry assistant, automating data collection and analysis, thereby saving manpower and enhancing performance. Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data, and we compared our method with human experts in terms of content correctness and time efficiency. The proposed approach marks a significant advancement in automating chemical literature extraction and demonstrates the potential for AI to revolutionize data management and utilization in chemistry.