A Novel Framework for Multi-Modal Protein Representation Learning

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This paper addresses two key challenges in protein function prediction: (1) distribution mismatch among intrinsic modalities (e.g., sequence and structure) and extrinsic modalities (e.g., protein–protein interaction networks and Gene Ontology annotations), and (2) noise corruption in extrinsic relational graphs. To tackle these, we propose a unified multimodal representation framework. Its core innovations are: (1) optimal transport (OT)-based alignment to harmonize cross-modal distributions of heterogeneous intrinsic embeddings; and (2) a conditional graph generation (CGG) mechanism that dynamically constructs high-quality contextual graphs to enhance the robustness of graph neural network (GNN) message passing. On standard Gene Ontology (GO) benchmarks, our method achieves consistent improvements—AUPR gains of 0.002–0.013 and F<sub>max</sub> gains of 0.004–0.007—surpassing or matching state-of-the-art methods. Ablation studies confirm the essential contributions of both the OT alignment and CGG modules.

Technology Category

Application Category

📝 Abstract

Accurate protein function prediction requires integrating heterogeneous intrinsic signals (e.g., sequence and structure) with noisy extrinsic contexts (e.g., protein-protein interactions and GO term annotations). However, two key challenges hinder effective fusion: (i) cross-modal distributional mismatch among embeddings produced by pre-trained intrinsic encoders, and (ii) noisy relational graphs of extrinsic data that degrade GNN-based information aggregation. We propose Diffused and Aligned Multi-modal Protein Embedding (DAMPE), a unified framework that addresses these through two core mechanisms. First, we propose Optimal Transport (OT)-based representation alignment that establishes correspondence between intrinsic embedding spaces of different modalities, effectively mitigating cross-modal heterogeneity. Second, we develop a Conditional Graph Generation (CGG)-based information fusion method, where a condition encoder fuses the aligned intrinsic embeddings to provide informative cues for graph reconstruction. Meanwhile, our theoretical analysis implies that the CGG objective drives this condition encoder to absorb graph-aware knowledge into its produced protein representations. Empirically, DAMPE outperforms or matches state-of-the-art methods such as DPFunc on standard GO benchmarks, achieving AUPR gains of 0.002-0.013 pp and Fmax gains 0.004-0.007 pp. Ablation studies further show that OT-based alignment contributes 0.043-0.064 pp AUPR, while CGG-based fusion adds 0.005-0.111 pp Fmax. Overall, DAMPE offers a scalable and theoretically grounded approach for robust multi-modal protein representation learning, substantially enhancing protein function prediction.

Problem

Research questions and friction points this paper is trying to address.

Addresses cross-modal distribution mismatch in protein embeddings

Mitigates noisy relational graphs in extrinsic protein data

Enhances protein function prediction through unified representation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns protein embeddings using Optimal Transport

Fuses data via Conditional Graph Generation method

Integrates intrinsic and extrinsic protein information

🔎 Similar Papers

GOProteinGNN: Leveraging Protein Knowledge Graphs for Protein Representation Learning