A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing pre-trained models for structure-based drug discovery (SBDD) neglect cross-domain interactions between proteins and ligands. Method: We propose BIT, a universal foundation model that unifies representation learning for small molecules, proteins, and their complexes (2D/3D). BIT introduces two novel mixture-of-experts mechanisms: Mixture-of-Domain-Experts (MoDE) to capture cross-domain semantics, and Mixture-of-Structure-Experts (MoSE) to encode multi-scale structural features. Integrated with geometry-aware graph neural networks, 3D coordinate encoding, and domain-adaptive attention, BIT employs multi-task self-supervised denoising pre-training atop a shared Transformer backbone. Contribution/Results: BIT achieves state-of-the-art performance across binding affinity prediction, virtual screening, and molecular property prediction—demonstrating substantial improvements in modeling complex protein–ligand interactions and establishing new benchmarks for SBDD foundation models.

Technology Category

Application Category

📝 Abstract
Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein. Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications, including drug discovery and protein engineering. However, in most approaches, the pre-trained models primarily focus on the characteristics of either small molecules or proteins, without delving into their binding interactions which are essential cross-domain relationships pivotal to SBDD. To fill this gap, we propose a general-purpose foundation model named BIT (an abbreviation for Biomolecular Interaction Transformer), which is capable of encoding a range of biochemical entities, including small molecules, proteins, and protein-ligand complexes, as well as various data formats, encompassing both 2D and 3D structures. Specifically, we introduce Mixture-of-Domain-Experts (MoDE) to handle the biomolecules from diverse biochemical domains and Mixture-of-Structure-Experts (MoSE) to capture positional dependencies in the molecular structures. The proposed mixture-of-experts approach enables BIT to achieve both deep fusion and domain-specific encoding, effectively capturing fine-grained molecular interactions within protein-ligand complexes. Then, we perform cross-domain pre-training on the shared Transformer backbone via several unified self-supervised denoising tasks. Experimental results on various benchmarks demonstrate that BIT achieves exceptional performance in downstream tasks, including binding affinity prediction, structure-based virtual screening, and molecular property prediction.
Problem

Research questions and friction points this paper is trying to address.

Develops a model for cross-domain molecular interactions in drug discovery.
Encodes diverse biochemical entities and structures for comprehensive analysis.
Improves prediction accuracy in binding affinity and molecular properties.
Innovation

Methods, ideas, or system contributions that make the work stand out.

BIT model encodes diverse biochemical entities
Mixture-of-Experts handles domain-specific molecular data
Cross-domain pre-training enhances molecular interaction understanding
🔎 Similar Papers
No similar papers found.
Yiheng Zhu
Yiheng Zhu
Zhongguancun Academy & Zhongguancun Institute of Artificial Intelligence
AI for ScienceDeep generative modelsProtein designDrug discovery
M
Mingyang Li
Alibaba Cloud Computing, Beijing, 100012, China
Junlong Liu
Junlong Liu
South China University of Technology
Natural Language Processing
K
Kun Fu
Alibaba Cloud Computing, Beijing, 100012, China
J
Jiansheng Wu
School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, Jiangsu, China
Qiuyi Li
Qiuyi Li
Zhongguancun Academy & Zhongguancun Institute of Artificial Intelligence
GenomicsFoundation modelLarge language modelMachine learning
Mingze Yin
Mingze Yin
Zhejiang University
Deep LearningAI for ScienceComputer Vision
J
Jieping Ye
Alibaba Cloud Computing, Beijing, 100012, China
J
Jian Wu
State Key Laboratory of Transvascular Implantation Devices of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, Zhejiang, China; School of Public Health, Zhejiang University, Hangzhou, 310058, Zhejiang, China
Z
Zheng Wang
Alibaba Cloud Computing, Beijing, 100012, China