An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning

📅 2024-02-20

📈 Citations: 4

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing 3D molecular pretraining models are typically confined to a single domain—either small molecules or proteins—hindering cross-domain knowledge transfer. To address this, we propose EPT, the first cross-domain unified all-atom foundation model. Methodologically, EPT introduces an E(3)-equivariant Transformer architecture coupled with a novel block-level denoising pretraining objective, enabling joint representation learning of protein residues and small-molecule atoms for unified modeling of small molecules, proteins, and their complexes. Experiments demonstrate that EPT achieves state-of-the-art performance in binding affinity prediction, outperforms prior methods across diverse molecular and protein property prediction tasks, and maintains structural stability under molecular dynamics simulations. Leveraging EPT, we successfully identified seven novel candidate compounds exhibiting higher predicted binding affinity against SARS-CoV-2 targets than known antiviral drugs.

Technology Category

Application Category

📝 Abstract

Pretraining on a large number of unlabeled 3D molecules has showcased superiority in various scientific applications. However, prior efforts typically focus on pretraining models in a specific domain, either proteins or small molecules, missing the opportunity to leverage cross-domain knowledge. To mitigate this gap, we introduce Equivariant Pretrained Transformer (EPT), an all-atom foundation model that can be pretrained from multiple domain 3D molecules. Built upon an E(3)-equivariant transformer, EPT is able to not only process atom-level information but also incorporate block-level features (e.g. residuals in proteins). Additionally, we employ a block-level denoising task, rather than the conventional atom-level denoising, as the pretraining objective. To pretrain EPT, we construct a large-scale dataset of 5.89M entries, comprising small molecules, proteins, protein-protein complexes, and protein-molecule complexes. Experimental evaluations on downstream tasks including ligand binding affinity prediction, protein property prediction, and molecular property prediction, show that EPT significantly outperforms previous state-of-the-art methods in the first task and achieves competitively superior performance for the remaining two tasks. Furthermore, we demonstrate the potential of EPT in identifying small molecule drug candidates targeting 3CL protease, a critical target in the replication of SARS-CoV-2. Among 1,978 FDA-approved drugs, EPT ranks 7 out of 8 known anti-COVID-19 drugs in the top 200, indicating the high recall of EPT. By using Molecular Dynamics (MD) simulations, EPT further discoveries 7 novel compounds whose binding affinities are higher than that of the top-ranked known anti-COVID-19 drug, showcasing its powerful capabilities in drug discovery.

Problem

Research questions and friction points this paper is trying to address.

Develops a cross-domain 3D molecular representation model.

Integrates atom and block-level features for enhanced learning.

Improves drug discovery for targets like SARS-CoV-2.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Equivariant Pretrained Transformer model

Block-level denoising pretraining objective

Large-scale dataset for cross-domain pretraining

🔎 Similar Papers

3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization