🤖 AI Summary
Existing retrosynthetic prediction methods neglect molecular 3D geometry, leading to inaccurate reaction center identification and 3D-implausible reactant generation. To address this, we propose a two-stage framework. In the first stage, we design a dual-graph molecular representation—integrating primal and dual graphs—augmented with molecular face encoding to precisely localize reaction centers. In the second stage, we introduce a novel conditional 3D diffusion model for synthon-to-reactant generation, incorporating geometry-aware denoising under joint topological and spatial constraints to produce chemically valid, full-atom reactants. Evaluated on multiple benchmarks, our method significantly outperforms semi-template-based approaches: it achieves a 4.2% absolute improvement in Top-1 accuracy and establishes new state-of-the-art performance in both 3D conformational validity and reaction feasibility.
📝 Abstract
Retrosynthesis prediction focuses on identifying reactants capable of synthesizing a target product. Typically, the retrosynthesis prediction involves two phases: Reaction Center Identification and Reactant Generation. However, we argue that most existing methods suffer from two limitations in the two phases: (i) Existing models do not adequately capture the ``face'' information in molecular graphs for the reaction center identification. (ii) Current approaches for the reactant generation predominantly use sequence generation in a 2D space, which lacks versatility in generating reasonable distributions for completed reactive groups and overlooks molecules' inherent 3D properties. To overcome the above limitations, we propose GDiffRetro. For the reaction center identification, GDiffRetro uniquely integrates the original graph with its corresponding dual graph to represent molecular structures, which helps guide the model to focus more on the faces in the graph. For the reactant generation, GDiffRetro employs a conditional diffusion model in 3D to further transform the obtained synthon into a complete reactant. Our experimental findings reveal that GDiffRetro outperforms state-of-the-art semi-template models across various evaluative metrics.