SynCoGen: Synthesizable 3D Molecule Generation via Joint Reaction and Coordinate Modeling

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
3D molecular generation faces dual challenges: poor synthetic accessibility and difficulty in modeling geometric constraints. To address these, we propose the first unified non-autoregressive framework that jointly models reaction pathways, molecular graphs, and 3D conformations for synthesis-aware co-generation. We introduce SynSpace, a building-block graph dataset comprising over 600,000 synthetically valid reaction paths. Our method innovatively integrates masked graph diffusion with continuous normalizing flows—using reaction templates to guide molecular assembly and flow-based models to precisely capture atomic spatial distributions. Experimentally, our approach achieves state-of-the-art performance in unconditional joint molecular graph–conformation generation. Moreover, it significantly outperforms existing methods in protein–ligand zero-shot linker design, while supporting critical drug discovery tasks such as analog expansion and lead compound optimization.

Technology Category

Application Category

📝 Abstract
Ensuring synthesizability in generative small molecule design remains a major challenge. While recent developments in synthesizable molecule generation have demonstrated promising results, these efforts have been largely confined to 2D molecular graph representations, limiting the ability to perform geometry-based conditional generation. In this work, we present SynCoGen (Synthesizable Co-Generation), a single framework that combines simultaneous masked graph diffusion and flow matching for synthesizable 3D molecule generation. SynCoGen samples from the joint distribution of molecular building blocks, chemical reactions, and atomic coordinates. To train the model, we curated SynSpace, a dataset containing over 600K synthesis-aware building block graphs and 3.3M conformers. SynCoGen achieves state-of-the-art performance in unconditional small molecule graph and conformer generation, and the model delivers competitive performance in zero-shot molecular linker design for protein ligand generation in drug discovery. Overall, this multimodal formulation represents a foundation for future applications enabled by non-autoregressive molecular generation, including analog expansion, lead optimization, and direct structure conditioning.
Problem

Research questions and friction points this paper is trying to address.

Ensuring synthesizability in 3D molecule generation
Overcoming limitations of 2D molecular representations
Joint modeling of reactions and atomic coordinates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint masked graph diffusion and flow matching
Generates 3D molecules via multimodal synthesis
Uses SynSpace dataset for training