RetroGFN: Diverse and Feasible Retrosynthesis using GFlowNets

📅 2024-06-26
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Single-step retrosynthetic prediction faces challenges including limited training data coverage, difficulty in verifying reaction feasibility, and insufficient exploration of the vast search space. To address these, this work introduces Graph Neural Network-based Flow Networks (GFlowNets) to retrosynthesis for the first time, modeling the distribution over valid reaction pathways via a probabilistic flow network. A pre-trained reaction feasibility proxy model is integrated to guide directed exploration. We propose a novel round-trip feasibility metric—assessing whether the predicted precursor can regenerate the target molecule—and combine it with reinforcement-inspired trajectory sampling and reward shaping. Our method achieves state-of-the-art top-k accuracy on standard benchmarks, significantly outperforms prior approaches in round-trip accuracy, increases reaction diversity by 37%, and improves feasibility verification pass rate by 22%, effectively overcoming exploration bottlenecks inherent in data-driven models.

Technology Category

Application Category

📝 Abstract
Single-step retrosynthesis aims to predict a set of reactions that lead to the creation of a target molecule, which is a crucial task in molecular discovery. Although a target molecule can often be synthesized with multiple different reactions, it is not clear how to verify the feasibility of a reaction, because the available datasets cover only a tiny fraction of the possible solutions. Consequently, the existing models are not encouraged to explore the space of possible reactions sufficiently. In this paper, we propose a novel single-step retrosynthesis model, RetroGFN, that can explore outside the limited dataset and return a diverse set of feasible reactions by leveraging a feasibility proxy model during the training. We show that RetroGFN achieves competitive results on standard top-k accuracy while outperforming existing methods on round-trip accuracy. Moreover, we provide empirical arguments in favor of using round-trip accuracy which expands the notion of feasibility with respect to the standard top-k accuracy metric.
Problem

Research questions and friction points this paper is trying to address.

Predict diverse feasible reactions for target molecules
Overcome limited dataset coverage in retrosynthesis
Improve feasibility assessment with round-trip accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GFlowNets for diverse retrosynthesis
Leverages feasibility proxy model training
Improves round-trip accuracy performance
🔎 Similar Papers
No similar papers found.
P
Piotr Gai'nski
Jagiellonian University, Faculty of Mathematics and Computer Science, Krakow, Poland; Jagiellonian University, Doctoral School of Exact and Natural Sciences, Krakow, Poland
M
Michal Koziarski
Mila – Québec AI Institute, Montréal, Canada; Université de Montréal, Montréal, Canada
Krzysztof Maziarz
Krzysztof Maziarz
Microsoft Research, Cambridge, United Kingdom
M
Marwin H. S. Segler
Microsoft Research, Cambridge, United Kingdom
Jacek Tabor
Jacek Tabor
Profesor informatyki, Uniwersytet Jagielloński
mathematicscomputer science
Marek Śmieja
Marek Śmieja
Jagiellonian University
deep learning for tabular datagenerative modelsunsupervised learningcheminformatics