A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing pre-trained models for structure-based drug discovery (SBDD) neglect cross-domain interactions between proteins and ligands. Method: We propose BIT, a universal foundation model that unifies representation learning for small molecules, proteins, and their complexes (2D/3D). BIT introduces two novel mixture-of-experts mechanisms: Mixture-of-Domain-Experts (MoDE) to capture cross-domain semantics, and Mixture-of-Structure-Experts (MoSE) to encode multi-scale structural features. Integrated with geometry-aware graph neural networks, 3D coordinate encoding, and domain-adaptive attention, BIT employs multi-task self-supervised denoising pre-training atop a shared Transformer backbone. Contribution/Results: BIT achieves state-of-the-art performance across binding affinity prediction, virtual screening, and molecular property prediction—demonstrating substantial improvements in modeling complex protein–ligand interactions and establishing new benchmarks for SBDD foundation models.

Technology Category

Application Category

📝 Abstract

Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein. Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications, including drug discovery and protein engineering. However, in most approaches, the pre-trained models primarily focus on the characteristics of either small molecules or proteins, without delving into their binding interactions which are essential cross-domain relationships pivotal to SBDD. To fill this gap, we propose a general-purpose foundation model named BIT (an abbreviation for Biomolecular Interaction Transformer), which is capable of encoding a range of biochemical entities, including small molecules, proteins, and protein-ligand complexes, as well as various data formats, encompassing both 2D and 3D structures. Specifically, we introduce Mixture-of-Domain-Experts (MoDE) to handle the biomolecules from diverse biochemical domains and Mixture-of-Structure-Experts (MoSE) to capture positional dependencies in the molecular structures. The proposed mixture-of-experts approach enables BIT to achieve both deep fusion and domain-specific encoding, effectively capturing fine-grained molecular interactions within protein-ligand complexes. Then, we perform cross-domain pre-training on the shared Transformer backbone via several unified self-supervised denoising tasks. Experimental results on various benchmarks demonstrate that BIT achieves exceptional performance in downstream tasks, including binding affinity prediction, structure-based virtual screening, and molecular property prediction.

Problem

Research questions and friction points this paper is trying to address.

Develops a model for cross-domain molecular interactions in drug discovery.

Encodes diverse biochemical entities and structures for comprehensive analysis.

Improves prediction accuracy in binding affinity and molecular properties.

Innovation

Methods, ideas, or system contributions that make the work stand out.

BIT model encodes diverse biochemical entities

Mixture-of-Experts handles domain-specific molecular data

Cross-domain pre-training enhances molecular interaction understanding

🔎 Similar Papers

Multi-channel learning for integrating structural hierarchies into context-dependent molecular representation