A Novel Framework for Multi-Modal Protein Representation Learning

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

183K/year
🤖 AI Summary
This paper addresses two key challenges in protein function prediction: (1) distribution mismatch among intrinsic modalities (e.g., sequence and structure) and extrinsic modalities (e.g., protein–protein interaction networks and Gene Ontology annotations), and (2) noise corruption in extrinsic relational graphs. To tackle these, we propose a unified multimodal representation framework. Its core innovations are: (1) optimal transport (OT)-based alignment to harmonize cross-modal distributions of heterogeneous intrinsic embeddings; and (2) a conditional graph generation (CGG) mechanism that dynamically constructs high-quality contextual graphs to enhance the robustness of graph neural network (GNN) message passing. On standard Gene Ontology (GO) benchmarks, our method achieves consistent improvements—AUPR gains of 0.002–0.013 and F<sub>max</sub> gains of 0.004–0.007—surpassing or matching state-of-the-art methods. Ablation studies confirm the essential contributions of both the OT alignment and CGG modules.

Technology Category

Application Category

📝 Abstract
Accurate protein function prediction requires integrating heterogeneous intrinsic signals (e.g., sequence and structure) with noisy extrinsic contexts (e.g., protein-protein interactions and GO term annotations). However, two key challenges hinder effective fusion: (i) cross-modal distributional mismatch among embeddings produced by pre-trained intrinsic encoders, and (ii) noisy relational graphs of extrinsic data that degrade GNN-based information aggregation. We propose Diffused and Aligned Multi-modal Protein Embedding (DAMPE), a unified framework that addresses these through two core mechanisms. First, we propose Optimal Transport (OT)-based representation alignment that establishes correspondence between intrinsic embedding spaces of different modalities, effectively mitigating cross-modal heterogeneity. Second, we develop a Conditional Graph Generation (CGG)-based information fusion method, where a condition encoder fuses the aligned intrinsic embeddings to provide informative cues for graph reconstruction. Meanwhile, our theoretical analysis implies that the CGG objective drives this condition encoder to absorb graph-aware knowledge into its produced protein representations. Empirically, DAMPE outperforms or matches state-of-the-art methods such as DPFunc on standard GO benchmarks, achieving AUPR gains of 0.002-0.013 pp and Fmax gains 0.004-0.007 pp. Ablation studies further show that OT-based alignment contributes 0.043-0.064 pp AUPR, while CGG-based fusion adds 0.005-0.111 pp Fmax. Overall, DAMPE offers a scalable and theoretically grounded approach for robust multi-modal protein representation learning, substantially enhancing protein function prediction.
Problem

Research questions and friction points this paper is trying to address.

Addresses cross-modal distribution mismatch in protein embeddings
Mitigates noisy relational graphs in extrinsic protein data
Enhances protein function prediction through unified representation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns protein embeddings using Optimal Transport
Fuses data via Conditional Graph Generation method
Integrates intrinsic and extrinsic protein information
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
R
Runjie Zheng
School of Computer Science and Engineering, Sun Yat-sen University (SYSU), No. 132, Outer Ring East Road, University Town, Panyu District, Guangzhou, 510006, Guangdong, China
Z
Zhen Wang
School of Computer Science and Engineering, Sun Yat-sen University (SYSU), No. 132, Outer Ring East Road, University Town, Panyu District, Guangzhou, 510006, Guangdong, China
A
Anjie Qiao
School of Computer Science and Engineering, Sun Yat-sen University (SYSU), No. 132, Outer Ring East Road, University Town, Panyu District, Guangzhou, 510006, Guangdong, China
J
Jiancong Xie
School of Computer Science and Engineering, Sun Yat-sen University (SYSU), No. 132, Outer Ring East Road, University Town, Panyu District, Guangzhou, 510006, Guangdong, China
Jiahua Rao
Jiahua Rao
Sun Yat-sen University
AI4ScienceMulti-scale Learning
Y
Yuedong Yang
School of Computer Science and Engineering, Sun Yat-sen University (SYSU), No. 132, Outer Ring East Road, University Town, Panyu District, Guangzhou, 510006, Guangdong, China