π€ AI Summary
Traditional retrosynthetic planning suffers from exponential search-space explosion and poor generalizability due to its reliance on single-step iterative decomposition. To address this, we propose an end-to-end multi-step synthesis pathway generation paradigm, formulating multi-step retrosynthesis for the first time as a controllable conditional sequence generation taskβenabling hard constraints such as prescribed step count and specified starting materials. Methodologically, we design a Transformer-based seq2seq model that jointly encodes molecular graphs and SMILES sequences, enabling molecule-level conditional pathway prediction. On the PaRoutes benchmark, our approach achieves a Top-1 accuracy 2.2β3.3Γ higher than state-of-the-art baselines. Moreover, it successfully generates chemically feasible, multi-step routes for numerous unseen FDA-approved drugs, demonstrating substantial improvements in both planning efficiency and cross-molecule generalization.
π Abstract
Traditional computer-aided synthesis planning (CASP) methods rely on iterative single-step predictions, leading to exponential search space growth that limits efficiency and scalability. We introduce a transformer-based model that directly generates multi-step synthetic routes as a single string by conditionally predicting each molecule based on all preceding ones. The model accommodates specific conditions such as the desired number of steps and starting materials, outperforming state-of-the-art methods on the PaRoutes dataset with a 2.2x improvement in Top-1 accuracy on the n$_1$ test set and a 3.3x improvement on the n$_5$ test set. It also successfully predicts routes for FDA-approved drugs not included in the training data, showcasing its generalization capabilities. While the current suboptimal diversity of the training set may impact performance on less common reaction types, our approach presents a promising direction towards fully automated retrosynthetic planning.