RETRO SYNFLOW: Discrete Flow Matching for Accurate and Diverse Single-Step Retrosynthesis

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Addressing the challenge of balancing accuracy and diversity in single-step retrosynthetic prediction, this paper proposes a reactant generation framework based on discrete flow matching. First, a synthons source distribution is constructed via reaction center identification; then, Feynman–Kac guidance coupled with sequential Monte Carlo resampling enables controllable generation. Crucially, a forward synthesis model serves as a reward oracle to enhance chemical feasibility during inference. This work pioneers the integration of discrete flow matching with reaction-center-guided synthon modeling. On the USPTO-50k benchmark, it achieves a top-1 accuracy of 60.0%—a 20-percentage-point improvement over prior state-of-the-art—and boosts top-5 round-trip accuracy by 19%. Overall, top-k performance sets a new state-of-the-art.

Technology Category

Application Category

📝 Abstract

A fundamental problem in organic chemistry is identifying and predicting the series of reactions that synthesize a desired target product molecule. Due to the combinatorial nature of the chemical search space, single-step reactant prediction -- i.e. single-step retrosynthesis -- remains challenging even for existing state-of-the-art template-free generative approaches to produce an accurate yet diverse set of feasible reactions. In this paper, we model single-step retrosynthesis planning and introduce RETRO SYNFLOW (RSF) a discrete flow-matching framework that builds a Markov bridge between the prescribed target product molecule and the reactant molecule. In contrast to past approaches, RSF employs a reaction center identification step to produce intermediate structures known as synthons as a more informative source distribution for the discrete flow. To further enhance diversity and feasibility of generated samples, we employ Feynman-Kac steering with Sequential Monte Carlo based resampling to steer promising generations at inference using a new reward oracle that relies on a forward-synthesis model. Empirically, we demonstrate ameshort achieves $60.0 %$ top-1 accuracy, which outperforms the previous SOTA by $20 %$. We also substantiate the benefits of steering at inference and demonstrate that FK-steering improves top-$5$ round-trip accuracy by $19 %$ over prior template-free SOTA methods, all while preserving competitive top-$k$ accuracy results.

Problem

Research questions and friction points this paper is trying to address.

Predicting feasible single-step retrosynthesis reactions accurately

Generating diverse reactant sets for target product molecules

Improving reaction center identification and sample feasibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete flow-matching framework for retrosynthesis

Reaction center identification for intermediate synthons

Feynman-Kac steering with Sequential Monte Carlo resampling

🔎 Similar Papers

A Survey on Future Frame Synthesis: Bridging Deterministic and Generative Approaches