SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
3D molecular generation faces dual challenges: poor synthetic accessibility and difficulty in modeling geometric constraints. To address these, we propose the first unified non-autoregressive framework that jointly models reaction pathways, molecular graphs, and 3D conformations for synthesis-aware co-generation. We introduce SynSpace, a building-block graph dataset comprising over 600,000 synthetically valid reaction paths. Our method innovatively integrates masked graph diffusion with continuous normalizing flows—using reaction templates to guide molecular assembly and flow-based models to precisely capture atomic spatial distributions. Experimentally, our approach achieves state-of-the-art performance in unconditional joint molecular graph–conformation generation. Moreover, it significantly outperforms existing methods in protein–ligand zero-shot linker design, while supporting critical drug discovery tasks such as analog expansion and lead compound optimization.

Technology Category

Application Category

📝 Abstract
Ensuring synthesizability in generative small molecule design remains a major challenge. While recent developments in synthesizable molecule generation have demonstrated promising results, these efforts have been largely confined to 2D molecular graph representations, limiting the ability to perform geometry-based conditional generation. In this work, we present SynCoGen (Synthesizable Co-Generation), a single framework that combines simultaneous masked graph diffusion and flow matching for synthesizable 3D molecule generation. SynCoGen samples from the joint distribution of molecular building blocks, chemical reactions, and atomic coordinates. To train the model, we curated SynSpace, a dataset containing over 600K synthesis-aware building block graphs and 3.3M conformers. SynCoGen achieves state-of-the-art performance in unconditional small molecule graph and conformer generation, and the model delivers competitive performance in zero-shot molecular linker design for protein ligand generation in drug discovery. Overall, this multimodal formulation represents a foundation for future applications enabled by non-autoregressive molecular generation, including analog expansion, lead optimization, and direct structure conditioning.
Problem

Research questions and friction points this paper is trying to address.

Ensuring synthesizability in 3D molecule generation
Overcoming limitations of 2D molecular representations
Joint modeling of reactions and atomic coordinates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint masked graph diffusion and flow matching
Generates 3D molecules via multimodal synthesis
Uses SynSpace dataset for training
A
Andrei Rekesh
University of Toronto, The Hospital for Sick Children
M
Miruna Cretu
University of Cambridge
D
Dmytro Shevchuk
University of Toronto, The Hospital for Sick Children
Vignesh Ram Somnath
Vignesh Ram Somnath
PhD Student, ETH Zurich
Deep LearningGenerative ModelsProtein-Ligand DockingDrug Discovery
Pietro Liò
Pietro Liò
Professor, University of Cambridge
AI & Comp Biology -> Medicine
R
Robert A. Batey
University of Toronto
M
Mike Tyers
University of Toronto, The Hospital for Sick Children
Michał Koziarski
Michał Koziarski
The Hospital for Sick Children, University of Toronto, Vector Institute
Cheng-Hao Liu
Cheng-Hao Liu
Caltech