🤖 AI Summary
Structural-based drug design (SBDD) suffers from fragmented tasks and the absence of a unified generative paradigm. Method: This paper introduces the first multimodal flow matching–based unified generative framework for SBDD, jointly modeling pocket-conditioned de novo molecular generation, binding pose prediction, and binding affinity estimation. Our approach integrates protein–ligand 3D structural priors to enable end-to-end conditional molecular generation and optimization. We construct a large-scale dataset comprising 500 million high-quality 3D conformations, substantially enhancing chemical space coverage. Contribution/Results: The framework achieves state-of-the-art performance across multiple SBDD benchmarks. To foster reproducibility and community advancement, we fully open-source the code, pretrained models, and dataset—enabling transparent, scalable, and generative drug discovery research.
📝 Abstract
Structure-based drug design (SBDD) focuses on designing small-molecule ligands that bind to specific protein pockets. Computational methods are integral in modern SBDD workflows and often make use of virtual screening methods via docking or pharmacophore search. Modern generative modeling approaches have focused on improving novel ligand discovery by enabling de novo design. In this work, we recognize that these tasks share a common structure and can therefore be represented as different instantiations of a consistent generative modeling framework. We propose a unified approach in OMTRA, a multi-modal flow matching model that flexibly performs many tasks relevant to SBDD, including some with no analogue in conventional workflows. Additionally, we curate a dataset of 500M 3D molecular conformers, complementing protein-ligand data and expanding the chemical diversity available for training. OMTRA obtains state of the art performance on pocket-conditioned de novo design and docking; however, the effects of large-scale pretraining and multi-task training are modest. All code, trained models, and dataset for reproducing this work are available at https://github.com/gnina/OMTRA