🤖 AI Summary
In retrosynthetic planning, the failure of any single leaf node invalidates the entire synthesis tree; however, existing methods optimize only average branching performance, neglecting worst-case path robustness. This work formulates retrosynthesis as a worst-path optimization problem within a tree-structured Markov decision process (Tree-MDP) — the first such formulation. We propose a Worst-Path Policy Learning framework, proving its optimal solution is unique and its policy converges monotonically. To enhance high-advantage historical decisions, we introduce self-imitation learning, and design InterRetro — an interactive learning framework jointly optimizing value function estimation and policy iteration. On the Retro*-190 benchmark, our method achieves 100% success rate, reduces average synthesis path length by 4.9%, and surpasses prior state-of-the-art using only 10% of the training data, significantly improving both planning efficiency and robustness.
📝 Abstract
Retrosynthesis planning aims to decompose target molecules into available building blocks, forming a synthesis tree where each internal node represents an intermediate compound and each leaf ideally corresponds to a purchasable reactant. However, this tree becomes invalid if any leaf node is not a valid building block, making the planning process vulnerable to the"weakest link"in the synthetic route. Existing methods often optimise for average performance across branches, failing to account for this worst-case sensitivity. In this paper, we reframe retrosynthesis as a worst-path optimisation problem within tree-structured Markov Decision Processes (MDPs). We prove that this formulation admits a unique optimal solution and offers monotonic improvement guarantees. Building on this insight, we introduce Interactive Retrosynthesis Planning (InterRetro), a method that interacts with the tree MDP, learns a value function for worst-path outcomes, and improves its policy through self-imitation, preferentially reinforcing past decisions with high estimated advantage. Empirically, InterRetro achieves state-of-the-art results, solving 100% of targets on the Retro*-190 benchmark, shortening synthetic routes by 4.9%, and achieving promising performance using only 10% of the training data - representing a significant advance in computational retrosynthesis planning.