An Equivariant Pretrained Transformer for Unified 3D Molecular Representation Learning

📅 2024-02-20
📈 Citations: 4
Influential: 0
📄 PDF

career value

179K/year
🤖 AI Summary
Existing 3D molecular pretraining models are typically confined to a single domain—either small molecules or proteins—hindering cross-domain knowledge transfer. To address this, we propose EPT, the first cross-domain unified all-atom foundation model. Methodologically, EPT introduces an E(3)-equivariant Transformer architecture coupled with a novel block-level denoising pretraining objective, enabling joint representation learning of protein residues and small-molecule atoms for unified modeling of small molecules, proteins, and their complexes. Experiments demonstrate that EPT achieves state-of-the-art performance in binding affinity prediction, outperforms prior methods across diverse molecular and protein property prediction tasks, and maintains structural stability under molecular dynamics simulations. Leveraging EPT, we successfully identified seven novel candidate compounds exhibiting higher predicted binding affinity against SARS-CoV-2 targets than known antiviral drugs.

Technology Category

Application Category

📝 Abstract
Pretraining on a large number of unlabeled 3D molecules has showcased superiority in various scientific applications. However, prior efforts typically focus on pretraining models in a specific domain, either proteins or small molecules, missing the opportunity to leverage cross-domain knowledge. To mitigate this gap, we introduce Equivariant Pretrained Transformer (EPT), an all-atom foundation model that can be pretrained from multiple domain 3D molecules. Built upon an E(3)-equivariant transformer, EPT is able to not only process atom-level information but also incorporate block-level features (e.g. residuals in proteins). Additionally, we employ a block-level denoising task, rather than the conventional atom-level denoising, as the pretraining objective. To pretrain EPT, we construct a large-scale dataset of 5.89M entries, comprising small molecules, proteins, protein-protein complexes, and protein-molecule complexes. Experimental evaluations on downstream tasks including ligand binding affinity prediction, protein property prediction, and molecular property prediction, show that EPT significantly outperforms previous state-of-the-art methods in the first task and achieves competitively superior performance for the remaining two tasks. Furthermore, we demonstrate the potential of EPT in identifying small molecule drug candidates targeting 3CL protease, a critical target in the replication of SARS-CoV-2. Among 1,978 FDA-approved drugs, EPT ranks 7 out of 8 known anti-COVID-19 drugs in the top 200, indicating the high recall of EPT. By using Molecular Dynamics (MD) simulations, EPT further discoveries 7 novel compounds whose binding affinities are higher than that of the top-ranked known anti-COVID-19 drug, showcasing its powerful capabilities in drug discovery.
Problem

Research questions and friction points this paper is trying to address.

Develops a cross-domain 3D molecular representation model.
Integrates atom and block-level features for enhanced learning.
Improves drug discovery for targets like SARS-CoV-2.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Equivariant Pretrained Transformer model
Block-level denoising pretraining objective
Large-scale dataset for cross-domain pretraining
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
Rui Jiao
Rui Jiao
Tsinghua University
AIDDGenerative ModelsGraph Neural Networks
Xiangzhe Kong
Xiangzhe Kong
Tsinghua University
NLPGNNAIDDAI4Science
Z
Ziyang Yu
Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua University; Institute for AIR, Tsinghua University
Wenbing Huang
Wenbing Huang
Associate Professor, Renmin University of China
Machine LearningAI for Science
Y
Yang Liu
Dept. of Comp. Sci. & Tech., Institute for AI, Tsinghua University; Institute for AIR, Tsinghua University