🤖 AI Summary
Chemical literature contains heterogeneous, multimodal, and stylistically diverse information—such as text, molecular images, and reaction diagrams—leading to low accuracy in automated extraction. To address this, we propose a multi-agent system grounded in multimodal large language models (MLLMs), which decomposes complex tasks and enables collaborative reasoning across modalities. The system integrates specialized agents for molecular image recognition, reaction diagram parsing, named entity recognition, and textual reaction extraction, enabling end-to-end generation of structured reaction data. Evaluated on standard benchmarks, our approach achieves an overall F1 score of 80.8%, representing a 45.2-percentage-point improvement over prior state-of-the-art methods, with consistent and significant gains across all subtasks. This work pioneers the application of multi-agent collaboration to chemical information extraction, establishing a scalable, modular framework for constructing high-quality reaction databases and accelerating AI-driven chemical discovery.
📝 Abstract
To fully expedite AI-powered chemical research, high-quality chemical databases are the cornerstone. Automatic extraction of chemical information from the literature is essential for constructing reaction databases, but it is currently limited by the multimodality and style variability of chemical information. In this work, we developed a multimodal large language model (MLLM)-based multi-agent system for automatic chemical information extraction. We used the MLLM's strong reasoning capability to understand the structure of complex chemical graphics, decompose the extraction task into sub-tasks and coordinate a set of specialized agents to solve them. Our system achieved an F1 score of 80.8% on a benchmark dataset of complex chemical reaction graphics from the literature, surpassing the previous state-of-the-art model (F1 score: 35.6%) by a significant margin. Additionally, it demonstrated consistent improvements in key sub-tasks, including molecular image recognition, reaction image parsing, named entity recognition and text-based reaction extraction. This work is a critical step toward automated chemical information extraction into structured datasets, which will be a strong promoter of AI-driven chemical research.