OMTRA: A Multi-Task Generative Model for Structure-Based Drug Design

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Structural-based drug design (SBDD) suffers from fragmented tasks and the absence of a unified generative paradigm. Method: This paper introduces the first multimodal flow matching–based unified generative framework for SBDD, jointly modeling pocket-conditioned de novo molecular generation, binding pose prediction, and binding affinity estimation. Our approach integrates protein–ligand 3D structural priors to enable end-to-end conditional molecular generation and optimization. We construct a large-scale dataset comprising 500 million high-quality 3D conformations, substantially enhancing chemical space coverage. Contribution/Results: The framework achieves state-of-the-art performance across multiple SBDD benchmarks. To foster reproducibility and community advancement, we fully open-source the code, pretrained models, and dataset—enabling transparent, scalable, and generative drug discovery research.

Technology Category

Application Category

📝 Abstract
Structure-based drug design (SBDD) focuses on designing small-molecule ligands that bind to specific protein pockets. Computational methods are integral in modern SBDD workflows and often make use of virtual screening methods via docking or pharmacophore search. Modern generative modeling approaches have focused on improving novel ligand discovery by enabling de novo design. In this work, we recognize that these tasks share a common structure and can therefore be represented as different instantiations of a consistent generative modeling framework. We propose a unified approach in OMTRA, a multi-modal flow matching model that flexibly performs many tasks relevant to SBDD, including some with no analogue in conventional workflows. Additionally, we curate a dataset of 500M 3D molecular conformers, complementing protein-ligand data and expanding the chemical diversity available for training. OMTRA obtains state of the art performance on pocket-conditioned de novo design and docking; however, the effects of large-scale pretraining and multi-task training are modest. All code, trained models, and dataset for reproducing this work are available at https://github.com/gnina/OMTRA
Problem

Research questions and friction points this paper is trying to address.

Unifies generative modeling for multiple structure-based drug design tasks
Enables de novo ligand design and docking with a single model
Expands training data with 500M molecular conformers for diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal flow matching model for unified SBDD tasks
Large-scale dataset of 500M 3D molecular conformers
State-of-the-art pocket-conditioned de novo design and docking
🔎 Similar Papers
I
Ian Dunn
Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh
L
Liv Toft
Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
T
Tyler Katz
Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
J
Juhi Gupta
Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
R
Riya Shah
Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh
Ramith Hettiarachchi
Ramith Hettiarachchi
CMU-Pitt Ph.D. Program in Computational Biology
Machine LearningComputational BiologyAI for ScienceTrustworthy ML
D
David R. Koes
Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh