🤖 AI Summary
Addressing the challenge of balancing accuracy and diversity in single-step retrosynthetic prediction, this paper proposes a reactant generation framework based on discrete flow matching. First, a synthons source distribution is constructed via reaction center identification; then, Feynman–Kac guidance coupled with sequential Monte Carlo resampling enables controllable generation. Crucially, a forward synthesis model serves as a reward oracle to enhance chemical feasibility during inference. This work pioneers the integration of discrete flow matching with reaction-center-guided synthon modeling. On the USPTO-50k benchmark, it achieves a top-1 accuracy of 60.0%—a 20-percentage-point improvement over prior state-of-the-art—and boosts top-5 round-trip accuracy by 19%. Overall, top-k performance sets a new state-of-the-art.
📝 Abstract
A fundamental problem in organic chemistry is identifying and predicting the series of reactions that synthesize a desired target product molecule. Due to the combinatorial nature of the chemical search space, single-step reactant prediction -- i.e. single-step retrosynthesis -- remains challenging even for existing state-of-the-art template-free generative approaches to produce an accurate yet diverse set of feasible reactions. In this paper, we model single-step retrosynthesis planning and introduce RETRO SYNFLOW (RSF) a discrete flow-matching framework that builds a Markov bridge between the prescribed target product molecule and the reactant molecule. In contrast to past approaches, RSF employs a reaction center identification step to produce intermediate structures known as synthons as a more informative source distribution for the discrete flow. To further enhance diversity and feasibility of generated samples, we employ Feynman-Kac steering with Sequential Monte Carlo based resampling to steer promising generations at inference using a new reward oracle that relies on a forward-synthesis model. Empirically, we demonstrate
ameshort achieves $60.0 %$ top-1 accuracy, which outperforms the previous SOTA by $20 %$. We also substantiate the benefits of steering at inference and demonstrate that FK-steering improves top-$5$ round-trip accuracy by $19 %$ over prior template-free SOTA methods, all while preserving competitive top-$k$ accuracy results.