RFL: Simplifying Chemical Structure Recognition with Ring-Free Language

📅 2024-12-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing chemical structure recognition methods struggle to accurately parse 2D molecular graphs containing fused rings and complex branching. To address this, we propose Ring-Free Language (RFL), a novel hierarchical representation that decouples molecules into an acyclic skeleton and independent ring or branch components—ensuring uniqueness, conciseness, and human interpretability. We further design the Molecular Skeleton Decoder (MSD), a unified framework that explicitly separates skeleton generation from ring/branch prediction. MSD comprises three core modules: progressive skeleton generation, ring detection, and branch classification, and is compatible with both OCR- and GNN-based backbones. Evaluated on both printed and handwritten chemical structure recognition benchmarks, our method achieves state-of-the-art performance, significantly improving accuracy and cross-domain generalization. The source code is publicly available.

Technology Category

Application Category

📝 Abstract
The primary objective of Optical Chemical Structure Recognition is to identify chemical structure images into corresponding markup sequences. However, the complex two-dimensional structures of molecules, particularly those with rings and multiple branches, present significant challenges for current end-to-end methods to learn one-dimensional markup directly. To overcome this limitation, we propose a novel Ring-Free Language (RFL), which utilizes a divide-and-conquer strategy to describe chemical structures in a hierarchical form. RFL allows complex molecular structures to be decomposed into multiple parts, ensuring both uniqueness and conciseness while enhancing readability. This approach significantly reduces the learning difficulty for recognition models. Leveraging RFL, we propose a universal Molecular Skeleton Decoder (MSD), which comprises a skeleton generation module that progressively predicts the molecular skeleton and individual rings, along with a branch classification module for predicting branch information. Experimental results demonstrate that the proposed RFL and MSD can be applied to various mainstream methods, achieving superior performance compared to state-of-the-art approaches in both printed and handwritten scenarios. The code is available at https://github.com/JingMog/RFL-MSD.
Problem

Research questions and friction points this paper is trying to address.

Complex Molecular Structure Recognition
Chemical Structure Identification
Ring and Branching Structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ring-Free Language (RFL)
Molecular Scaffold Decoder (MSD)
Chemical Structure Recognition
🔎 Similar Papers
No similar papers found.
Qikai Chang
Qikai Chang
University of Science and Technology of China
OCRLLM
Mingjun Chen
Mingjun Chen
iFLYTEK Research
C
Changpeng Pi
iFLYTEK Research
P
Pengfei Hu
University of Science and Technology of China
Z
Zhenrong Zhang
University of Science and Technology of China
J
Jie Ma
University of Science and Technology of China
J
Jun Du
University of Science and Technology of China
Baocai Yin
Baocai Yin
Unknown affiliation
J
Jinshui Hu
iFLYTEK Research