Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chemical reaction data are predominantly unavailable in machine-readable formats, severely hindering AI applications in organic chemistry; automated parsing of reaction images has long been impeded by scarce annotated data and inadequate model architectures. This paper introduces RxnIM, the first multimodal large language model specifically designed for chemical reaction understanding. It innovatively integrates chemically aware tokenization, a reaction-graph-specific synthetic data generation framework, and instruction-based fine-tuning. Built upon a ViT visual encoder and a multimodal large language model (MLLM) architecture, RxnIM enables end-to-end, joint structural parsing of reaction images and textual condition information. On multiple benchmark datasets, it achieves an average F1 score of 88%, outperforming the prior state of the art by 5 percentage points. The project open-sources the model weights, training code, synthetic and real-world datasets, and an interactive web application—establishing a foundational data infrastructure for AI-driven chemistry research.

Technology Category

Application Category

📝 Abstract
Artificial intelligence (AI) has demonstrated significant promise in advancing organic chemistry research; however, its effectiveness depends on the availability of high-quality chemical reaction data. Currently, most published chemical reactions are not available in machine-readable form, limiting the broader application of AI in this field. The extraction of published chemical reactions into structured databases still relies heavily on manual curation, and robust automatic parsing of chemical reaction images into machine-readable data remains a significant challenge. To address this, we introduce the Reaction Image Multimodal large language model (RxnIM), the first multimodal large language model specifically designed to parse chemical reaction images into machine-readable reaction data. RxnIM not only extracts key chemical components from reaction images but also interprets the textual content that describes reaction conditions. Together with specially designed large-scale dataset generation method to support model training, our approach achieves excellent performance, with an average F1 score of 88% on various benchmarks, surpassing literature methods by 5%. This represents a crucial step toward the automatic construction of large databases of machine-readable reaction data parsed from images in the chemistry literature, providing essential data resources for AI research in chemistry. The source code, model checkpoints, and datasets developed in this work are released under permissive licenses. An instance of the RxnIM web application can be accessed at https://huggingface.co/spaces/CYF200127/RxnIM.
Problem

Research questions and friction points this paper is trying to address.

Automate parsing of chemical reaction images into machine-readable data.
Overcome reliance on manual curation for chemical reaction data extraction.
Enhance AI applications in chemistry with structured reaction databases.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal large language model for chemical image parsing
Automated extraction of machine-readable reaction data
High-performance dataset generation for model training
🔎 Similar Papers
No similar papers found.
Y
Yufan Chen
Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong SAR, China
C
Ching Ting Leung
Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong SAR, China
Jianwei Sun
Jianwei Sun
Professor of Department of Physics and Engineering Physics, Tulane University
Density Functional TheoryCondensed Matter PhysicsChemistryand Materials Science
Y
Yong Huang
Department of Chemistry, Hong Kong University of Science and Technology, Hong Kong SAR, China
L
Linyan Li
Department of Data Science, City University of Hong Kong, Hong Kong SAR, China
H
Hao Chen
Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong SAR, China; Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong SAR, China
Hanyu Gao
Hanyu Gao
MIT
Kinetic ModelingSimulation and OptimizationMachine Learning