🤖 AI Summary
This paper addresses two key challenges in protein function prediction: (1) distribution mismatch among intrinsic modalities (e.g., sequence and structure) and extrinsic modalities (e.g., protein–protein interaction networks and Gene Ontology annotations), and (2) noise corruption in extrinsic relational graphs. To tackle these, we propose a unified multimodal representation framework. Its core innovations are: (1) optimal transport (OT)-based alignment to harmonize cross-modal distributions of heterogeneous intrinsic embeddings; and (2) a conditional graph generation (CGG) mechanism that dynamically constructs high-quality contextual graphs to enhance the robustness of graph neural network (GNN) message passing. On standard Gene Ontology (GO) benchmarks, our method achieves consistent improvements—AUPR gains of 0.002–0.013 and F<sub>max</sub> gains of 0.004–0.007—surpassing or matching state-of-the-art methods. Ablation studies confirm the essential contributions of both the OT alignment and CGG modules.
📝 Abstract
Accurate protein function prediction requires integrating heterogeneous intrinsic signals (e.g., sequence and structure) with noisy extrinsic contexts (e.g., protein-protein interactions and GO term annotations). However, two key challenges hinder effective fusion: (i) cross-modal distributional mismatch among embeddings produced by pre-trained intrinsic encoders, and (ii) noisy relational graphs of extrinsic data that degrade GNN-based information aggregation. We propose Diffused and Aligned Multi-modal Protein Embedding (DAMPE), a unified framework that addresses these through two core mechanisms. First, we propose Optimal Transport (OT)-based representation alignment that establishes correspondence between intrinsic embedding spaces of different modalities, effectively mitigating cross-modal heterogeneity. Second, we develop a Conditional Graph Generation (CGG)-based information fusion method, where a condition encoder fuses the aligned intrinsic embeddings to provide informative cues for graph reconstruction. Meanwhile, our theoretical analysis implies that the CGG objective drives this condition encoder to absorb graph-aware knowledge into its produced protein representations. Empirically, DAMPE outperforms or matches state-of-the-art methods such as DPFunc on standard GO benchmarks, achieving AUPR gains of 0.002-0.013 pp and Fmax gains 0.004-0.007 pp. Ablation studies further show that OT-based alignment contributes 0.043-0.064 pp AUPR, while CGG-based fusion adds 0.005-0.111 pp Fmax. Overall, DAMPE offers a scalable and theoretically grounded approach for robust multi-modal protein representation learning, substantially enhancing protein function prediction.