MACReD: A Multi-Agent Collaborative Reasoning Framework for Reaction Diagram Parsing

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in parsing chemical reaction diagrams from scientific literature—namely, layout heterogeneity, entangled visual elements, and the disconnection between recognition and reasoning—by introducing a hierarchical multi-agent collaborative framework. The framework employs specialized agents for molecular perception, arrow interpretation, text extraction, and reaction reconstruction, all orchestrated under a unified vision-language model to enable multi-diagram fusion and joint reasoning. This approach effectively integrates heterogeneous visual and textual cues while preserving chemical logic consistency. Evaluated on the RxnScribe benchmark, the method substantially outperforms existing models, achieving hard-match and soft-match F1 scores of 75.2% and 84.6%, respectively, with notable improvements in robustness for parsing multi-step and tree-structured reaction schemes.
📝 Abstract
Parsing chemical reaction diagrams from scientific literature is challenging due to heterogeneous layouts, intertwined visual elements, and the difficulty of integrating recognition and reasoning. Existing vision-language models advance multimodal understanding but still fail on complex diagrams, struggling to maintain spatial coherence and to integrate multidimensional information during reasoning. To address these issues, we propose MACReD, a hierarchical multi-agent framework that coordinates specialized agents for molecular perception, arrow understanding, text extraction, and reaction reconstruction within a unified VLM-guided architecture. The planning and perception layers use flexible, fine-grained detection to handle visual complexity, while the reasoning layer uses a multigraph fusion mechanism to integrate heterogeneous cues and enforce chemically consistent global reasoning. Experiments on the RxnScribe benchmark show that MACReD achieves state-of-the-art performance, with F1 scores of 75.2% and 84.6% under hard and soft match criteria, outperforming the RxnScribe baseline, which obtains 69.1% and 80.0%, respectively. These results demonstrate the robustness of MACReD across diverse diagram layouts, including multi-step and tree-structured reactions.
Problem

Research questions and friction points this paper is trying to address.

reaction diagram parsing
multimodal understanding
spatial coherence
heterogeneous layouts
chemical reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent collaboration
reaction diagram parsing
multigraph fusion
vision-language model
chemical reasoning
🔎 Similar Papers
No similar papers found.