CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction

📅 2024-08-21
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited accuracy in predicting protein–RNA binding affinity and mutational effects. We propose Co-Former, the first cross-modal collaborative modeling paradigm: it integrates pretrained protein and RNA language models, introduces a bi-scope pretraining strategy, and enables sequence–structure multimodal joint encoding. We release PRA310—the first large-scale benchmark dataset comprising 310 protein–RNA complexes and over 10,000 affinity measurements. Co-Former achieves state-of-the-art performance across multiple standard benchmarks, accurately predicting both wild-type and mutant binding affinities. Ablation studies confirm the efficacy of both model and data scaling. Our core contributions are (i) a novel cross-biomodal collaborative representation learning framework and (ii) a scalable pretraining–fine-tuning paradigm tailored for protein–RNA interaction modeling.

Technology Category

Application Category

📝 Abstract
Accurately measuring protein-RNA binding affinity is crucial in many biological processes and drug design. Previous computational methods for protein-RNA binding affinity prediction rely on either sequence or structure features, unable to capture the binding mechanisms comprehensively. The recent emerging pre-trained language models trained on massive unsupervised sequences of protein and RNA have shown strong representation ability for various in-domain downstream tasks, including binding site prediction. However, applying different-domain language models collaboratively for complex-level tasks remains unexplored. In this paper, we propose CoPRA to bridge pre-trained language models from different biological domains via Complex structure for Protein-RNA binding Affinity prediction. We demonstrate for the first time that cross-biological modal language models can collaborate to improve binding affinity prediction. We propose a Co-Former to combine the cross-modal sequence and structure information and a bi-scope pre-training strategy for improving Co-Former's interaction understanding. Meanwhile, we build the largest protein-RNA binding affinity dataset PRA310 for performance evaluation. We also test our model on a public dataset for mutation effect prediction. CoPRA reaches state-of-the-art performance on all the datasets. We provide extensive analyses and verify that CoPRA can (1) accurately predict the protein-RNA binding affinity; (2) understand the binding affinity change caused by mutations; and (3) benefit from scaling data and model size.
Problem

Research questions and friction points this paper is trying to address.

Protein-RNA binding
Prediction accuracy
Mutation impact
Innovation

Methods, ideas, or system contributions that make the work stand out.

CoPRA
Co-Former
Mutation Impact Prediction
🔎 Similar Papers
No similar papers found.
Rong Han
Rong Han
BNRist, Department of Computer Science and Technology, Tsinghua University
X
Xiaohong Liu
UCL Cancer Institute, University College London
T
Tong Pan
Monash Data Futures Institute, Monash University; Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University
J
Jing Xu
Monash Data Futures Institute, Monash University; Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University
X
Xiao-Yong Wang
Monash Data Futures Institute, Monash University; Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University
W
Wuyang Lan
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications
Z
Zhenyu Li
BNRist, Department of Computer Science and Technology, Tsinghua University
Z
Zixuan Wang
BNRist, Department of Computer Science and Technology, Tsinghua University
J
Jiangning Song
Monash Data Futures Institute, Monash University; Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University
Guangyu Wang
Guangyu Wang
Houston Methodist
BioinformaticsComputational biologyAIepigenetics
T
Ting Chen
BNRist, Department of Computer Science and Technology, Tsinghua University