🤖 AI Summary
Automated extraction of molecular structures and reaction data from scientific literature is hindered by chemical representation diversity, document unstructuredness, and complex layouts. This paper introduces the first end-to-end vision-driven framework that jointly performs molecule instance detection, reaction graph topology parsing, and optical chemical structure recognition (OCSR) directly from full-page PDFs or images. We unify these three tasks within a single model architecture, construct the first page-level benchmark dataset comprising 550 annotated pages with MOLfile ground truth, and propose dedicated evaluation metrics. Our method integrates document layout-aware detection with reaction graph structural reasoning. It achieves state-of-the-art performance on our proprietary benchmark and multiple public datasets. The code, pre-trained models, and an interactive online demo will be publicly released.
📝 Abstract
The extraction of molecular structures and reaction data from scientific documents is challenging due to their varied, unstructured chemical formats and complex document layouts. To address this, we introduce MolMole, a vision-based deep learning framework that unifies molecule detection, reaction diagram parsing, and optical chemical structure recognition (OCSR) into a single pipeline for automating the extraction of chemical data directly from page-level documents. Recognizing the lack of a standard page-level benchmark and evaluation metric, we also present a testset of 550 pages annotated with molecule bounding boxes, reaction labels, and MOLfiles, along with a novel evaluation metric. Experimental results demonstrate that MolMole outperforms existing toolkits on both our benchmark and public datasets. The benchmark testset will be publicly available, and the MolMole toolkit will be accessible soon through an interactive demo on the LG AI Research website. For commercial inquiries, please contact us at href{mailto:contact_ddu@lgresearch.ai}{contact_ddu@lgresearch.ai}.