🤖 AI Summary
Predicting complete chemical reaction mechanisms (CRMs) remains challenging due to heavy reliance on expert knowledge, high computational cost, severe hallucination in deep learning models, and neglect of reactive intermediates. To address these issues, we propose a template-guided graph neural network framework that integrates atom- and bond-level dual attention with generalized mechanistic operation templates (TMOps), explicitly modeling multi-step intermediate evolution and reaction pathways. Trained on the large-scale ReactMech dataset, our method achieves 98.98% ± 0.12% accuracy for elementary step prediction and 95.94% ± 0.21% for full mechanism prediction. It demonstrates strong generalization and interpretability—accurately predicting major reactions, side products, and out-of-distribution scenarios—and successfully reconstructs prebiotically relevant synthetic pathways, including those of serine and aldopentoses.
📝 Abstract
Prediction of complete step-by-step chemical reaction mechanisms (CRMs) remains a major challenge. Whereas the traditional approaches in CRM tasks rely on expert-driven experiments or costly quantum chemical computations, contemporary deep learning (DL) alternatives ignore key intermediates and mechanistic steps and often suffer from hallucinations. We present DeepMech, an interpretable graph-based DL framework employing atom- and bond-level attention, guided by generalized templates of mechanistic operations (TMOps), to generate CRMs. Trained on our curated ReactMech dataset (~30K CRMs with 100K atom-mapped and mass-balanced elementary steps), DeepMech achieves 98.98+/-0.12% accuracy in predicting elementary steps and 95.94+/-0.21% in complete CRM tasks, besides maintaining high fidelity even in out-of-distribution scenarios as well as in predicting side and/or byproducts. Extension to multistep CRMs relevant to prebiotic chemistry, demonstrates the ability of DeepMech in effectively reconstructing pathways from simple primordial substrates to complex biomolecules such as serine and aldopentose. Attention analysis identifies reactive atoms/bonds in line with chemical intuition, rendering our model interpretable and suitable for reaction design.