🤖 AI Summary
To address the critical bottleneck in drug discovery—where molecular generation models neglect synthetic feasibility, hindering experimental validation—this work proposes a novel molecular generation framework projectable onto synthetically accessible chemical space. Methodologically, it introduces synthesis path expressions (SPEs) as a novel molecular representation that intrinsically encodes retrosynthetic logic, and designs a graph-based Transformer architecture for end-to-end translation from molecular graphs to SPEs. This formulation inherently guarantees synthetic feasibility of generated molecules and enables structure-preserving, synthetically constrained analog generation for initially infeasible candidates. Experiments demonstrate substantial improvements in retrosynthetic planning accuracy and successful re-mapping of multiple state-of-the-art generative model outputs—previously deemed synthetically intractable—into property-preserved, experimentally viable analogs. The approach effectively bridges the gap between de novo molecular generation and practical synthesis.
📝 Abstract
Discovering new drug molecules is a pivotal yet challenging process due to the near-infinitely large chemical space and notorious demands on time and resources. Numerous generative models have recently been introduced to accelerate the drug discovery process, but their progression to experimental validation remains limited, largely due to a lack of consideration for synthetic accessibility in practical settings. In this work, we introduce a novel framework that is capable of generating new chemical structures while ensuring synthetic accessibility. Specifically, we introduce a postfix notation of synthetic pathways to represent molecules in chemical space. Then, we design a transformer-based model to translate molecular graphs into postfix notations of synthesis. We highlight the model's ability to: (a) perform bottom-up synthesis planning more accurately, (b) generate structurally similar, synthesizable analogs for unsynthesizable molecules proposed by generative models with their properties preserved, and (c) explore the local synthesizable chemical space around hit molecules.