log-RRIM: Yield Prediction via Local-to-global Reaction Representation Learning and Interaction Modeling

📅 2024-10-20
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low prediction accuracy and high trial-and-error cost in organic reaction yield forecasting, this paper proposes a novel Graph Transformer model that introduces a “local-to-global” reaction representation learning paradigm. Methodologically, it integrates multi-scale molecular graph encoding with reaction-center-aware subgraph aggregation and incorporates a reagent–reaction-center cross-attention mechanism to explicitly model dynamic microscopic interactions during bond cleavage and formation. Evaluated on multiple benchmark datasets, the proposed approach significantly outperforms state-of-the-art methods: it reduces prediction error by 18.7% for reactions with medium-to-high yields (>60%), demonstrating superior generalization capability and practical utility for synthetic route design and optimization.

Technology Category

Application Category

📝 Abstract
Accurate prediction of chemical reaction yields is crucial for optimizing organic synthesis, potentially reducing time and resources spent on experimentation. With the rise of artificial intelligence (AI), there is growing interest in leveraging AI-based methods to accelerate yield predictions without conducting in vitro experiments. We present log-RRIM, an innovative graph transformer-based framework designed for predicting chemical reaction yields. Our approach implements a unique local-to-global reaction representation learning strategy. This approach initially captures detailed molecule-level information and then models and aggregates intermolecular interactions, ensuring that the impact of varying-sizes molecular fragments on yield is accurately accounted for. Another key feature of log-RRIM is its integration of a cross-attention mechanism that focuses on the interplay between reagents and reaction centers. This design reflects a fundamental principle in chemical reactions: the crucial role of reagents in influencing bond-breaking and formation processes, which ultimately affect reaction yields. log-RRIM outperforms existing methods in our experiments, especially for medium to high-yielding reactions, proving its reliability as a predictor. Its advanced modeling of reactant-reagent interactions and sensitivity to small molecular fragments make it a valuable tool for reaction planning and optimization in chemical synthesis. The data and codes of log-RRIM are accessible through https://github.com/ninglab/YieldlogRRIM.
Problem

Research questions and friction points this paper is trying to address.

Predict chemical reaction yields using AI to optimize synthesis.
Model reagent-reaction center interactions for accurate yield prediction.
Capture molecular fragment contributions through local-to-global representation learning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph transformer-based framework for yield prediction
Cross-attention mechanism for reagent-reaction center interplay
Local-to-global reaction representation learning strategy
X
Xiao Hu
Computer Science and Engineering, The Ohio State University, Columbus, OH 43210
Z
Ziqi Chen
Computer Science and Engineering, The Ohio State University, Columbus, OH 43210
Daniel Adu-Ampratwum
Daniel Adu-Ampratwum
Research Assistant Professor, Ohio State University
Organic ChemistryNatural Product SynthesisMedicinal ChemistryDrug Discovery.
B
B. Peng
Computer Science and Engineering, The Ohio State University, Columbus, OH 43210
Xia Ning
Xia Ning
Professor, Biomedical Informatics, Computer Science and Engineering, The Ohio State
GenAIMedical AILLMsDrug Development