Representation-Guided Discrete Molecular Graph Retrosynthesis

📅 2026-05-23

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Existing template-free, single-step retrosynthesis models suffer from slow convergence and limited generation quality and diversity due to the difficulty of explicitly modeling chemical semantics. To address this, this work proposes a Graph Representation Guidance (GRG) framework that integrates molecular representations from a pretrained encoder into a denoising diffusion Transformer. During generation, multi-granularity alignment strategies provide deep guidance, while a representation similarity–based reranking mechanism enhances both diversity and accuracy without requiring an additional verifier. Evaluated on USPTO-50k, the model achieves top-1/3/5/10 accuracies of 58.6/77.2/83.4/87.1, respectively, with diversity improved to 15.5, training epochs reduced by 35%, and inference time shortened by 30%.

📝 Abstract

Stochastic process-based molecular graph generators have become the state of the art for template-free single-step retrosynthesis. However, these models are typically trained only on product-reactant pairs, thereby acquiring chemistry-relevant representations in an indirect and implicit manner. Meanwhile, recent advances in computer vision demonstrate that offering representation guidance to a generator can effectively distill semantics from pretrained encoders into DiTs, substantially improving both convergence and generation quality. Whether similar gains extend to the retrosynthesis task, and what graph-specific design choices can make them work, remains an open question. To address these questions, we conduct a systematic empirical study over a unified design space spanning teacher molecular representations, endpoint and granularity choices, injection depths in the denoiser, correspondence strategies and guidance scheme. Guided by these considerations, we develop Graph-oriented Representation Guidance (GRG), which achieves 58.6 / 77.2 / 83.4 / 87.1 top-1 / 3 / 5 / 10 accuracy on USPTO-50k, while increasing diversity to 15.5, both substantially outperforming the adopted base generator. Notably, GRG consistently improves all top-k metrics in out-of-distribution settings, suggesting that representation guidance facilitates the acquisition of intrinsic chemical semantics. Meanwhile, the introduced representation guidance reduces the number of epochs by 35% and the wall-clock time by 30% to reach comparable performance. In addition, we introduce a simple yet effective representation-similarity-based reranking mechanism, which further improves the top of the ranked list without training an additional verifier.

Problem

Research questions and friction points this paper is trying to address.

retrosynthesis

molecular graph generation

representation guidance

template-free synthesis

chemical semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

representation guidance

molecular graph generation

retrosynthesis