Retrosynthesis Planning via Worst-path Policy Optimisation in Tree-structured MDPs

📅 2025-09-01
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In retrosynthetic planning, the failure of any single leaf node invalidates the entire synthesis tree; however, existing methods optimize only average branching performance, neglecting worst-case path robustness. This work formulates retrosynthesis as a worst-path optimization problem within a tree-structured Markov decision process (Tree-MDP) — the first such formulation. We propose a Worst-Path Policy Learning framework, proving its optimal solution is unique and its policy converges monotonically. To enhance high-advantage historical decisions, we introduce self-imitation learning, and design InterRetro — an interactive learning framework jointly optimizing value function estimation and policy iteration. On the Retro*-190 benchmark, our method achieves 100% success rate, reduces average synthesis path length by 4.9%, and surpasses prior state-of-the-art using only 10% of the training data, significantly improving both planning efficiency and robustness.

Technology Category

Application Category

📝 Abstract
Retrosynthesis planning aims to decompose target molecules into available building blocks, forming a synthesis tree where each internal node represents an intermediate compound and each leaf ideally corresponds to a purchasable reactant. However, this tree becomes invalid if any leaf node is not a valid building block, making the planning process vulnerable to the"weakest link"in the synthetic route. Existing methods often optimise for average performance across branches, failing to account for this worst-case sensitivity. In this paper, we reframe retrosynthesis as a worst-path optimisation problem within tree-structured Markov Decision Processes (MDPs). We prove that this formulation admits a unique optimal solution and offers monotonic improvement guarantees. Building on this insight, we introduce Interactive Retrosynthesis Planning (InterRetro), a method that interacts with the tree MDP, learns a value function for worst-path outcomes, and improves its policy through self-imitation, preferentially reinforcing past decisions with high estimated advantage. Empirically, InterRetro achieves state-of-the-art results, solving 100% of targets on the Retro*-190 benchmark, shortening synthetic routes by 4.9%, and achieving promising performance using only 10% of the training data - representing a significant advance in computational retrosynthesis planning.
Problem

Research questions and friction points this paper is trying to address.

Optimizes retrosynthesis planning to avoid invalid synthetic routes
Addresses worst-path sensitivity in tree-structured molecular decomposition
Improves policy through self-imitation learning in Markov Decision Processes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Worst-path optimization in tree MDPs
Self-imitation learning with advantage estimation
Interactive policy improvement for retrosynthesis planning
🔎 Similar Papers
No similar papers found.