🤖 AI Summary
Existing optical chemical structure recognition (OCSR) systems exhibit poor performance on real-world scientific literature due to complex visual interferences and challenging chemical semantics. To address this gap, this work proposes MolRecBench-Wild—the first benchmark for chemical structure recognition in the wild, grounded in the two-dimensional difficulty framework MOSAIC. It comprises 5,029 structures annotated with 37 fine-grained attributes. The study introduces CARBON, a novel representation language capable of expressing unconventional chemical semantics, and devises a dual-track evaluation protocol compatible with both CARBON and SMILES. Systematic evaluation of 18 state-of-the-art OCSR models reveals a significant performance drop in authentic academic settings, underscoring a substantial discrepancy between current methodologies and practical deployment requirements.
📝 Abstract
Optical Chemical Structure Recognition (OCSR) aims to translate molecular diagrams in scientific literature into machine-readable formats, but current systems remain unreliable on real-world images due to substantial visual and chemical complexity. We introduce MOSAIC, a dual-dimensional difficulty framework with 37 fine-grained labels that jointly characterize visual interference and chemical semantic challenges in molecular diagrams. Based on this framework, we construct MolRecBench-Wild, a benchmark of 5,029 structures from 820 recent chemistry papers, covering the full difficulty spectrum observed in real publications. To enable faithful semantic evaluation beyond SMILES and MolFile, we propose CARBON, a representation language capable of expressing valence variations, icon-based groups, and other non-standard chemical semantics. We further adopt a dual-track evaluation protocol supporting both CARBON and SMILES outputs for broad model compatibility. Comprehensive experiments over 18 OCSR-capable models reveal severe performance degradation on MolRecBench-Wild, exposing a large gap between previous patent benchmarks and real-world academic scenarios.