🤖 AI Summary
3D molecular generation faces dual challenges: poor synthetic accessibility and difficulty in modeling geometric constraints. To address these, we propose the first unified non-autoregressive framework that jointly models reaction pathways, molecular graphs, and 3D conformations for synthesis-aware co-generation. We introduce SynSpace, a building-block graph dataset comprising over 600,000 synthetically valid reaction paths. Our method innovatively integrates masked graph diffusion with continuous normalizing flows—using reaction templates to guide molecular assembly and flow-based models to precisely capture atomic spatial distributions. Experimentally, our approach achieves state-of-the-art performance in unconditional joint molecular graph–conformation generation. Moreover, it significantly outperforms existing methods in protein–ligand zero-shot linker design, while supporting critical drug discovery tasks such as analog expansion and lead compound optimization.
📝 Abstract
Ensuring synthesizability in generative small molecule design remains a major challenge. While recent developments in synthesizable molecule generation have demonstrated promising results, these efforts have been largely confined to 2D molecular graph representations, limiting the ability to perform geometry-based conditional generation. In this work, we present SynCoGen (Synthesizable Co-Generation), a single framework that combines simultaneous masked graph diffusion and flow matching for synthesizable 3D molecule generation. SynCoGen samples from the joint distribution of molecular building blocks, chemical reactions, and atomic coordinates. To train the model, we curated SynSpace, a dataset containing over 600K synthesis-aware building block graphs and 3.3M conformers. SynCoGen achieves state-of-the-art performance in unconditional small molecule graph and conformer generation, and the model delivers competitive performance in zero-shot molecular linker design for protein ligand generation in drug discovery. Overall, this multimodal formulation represents a foundation for future applications enabled by non-autoregressive molecular generation, including analog expansion, lead optimization, and direct structure conditioning.