Inferred global dense residue transition graphs from primary structure sequences enable protein interaction prediction via directed graph convolutional neural networks

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accurate prediction of protein–protein interactions (PPIs) is critical for functional annotation and drug discovery. Existing methods either rely on computationally expensive 3D structure modeling or shallow sequence embeddings, incurring high resource overhead. To address this, we propose ProtGram-DirectGCN: a novel framework that first constructs the first global directed residue transition graph (an n-gram graph) directly from protein sequences; then introduces DirectGCN—a customized graph convolutional network designed for dense, heterophilic, directed graphs—incorporating path-specific transformations and learnable gating mechanisms to enable multi-path information propagation and attention-based pooling. Crucially, our approach requires no 3D structural data, drastically reducing computational demands while maintaining strong robustness and generalization under few-shot settings. On standard graph benchmarks, it matches state-of-the-art performance; in PPI prediction, it achieves stable, superior accuracy over leading sequence-based models.

Technology Category

Application Category

📝 Abstract
Introduction Accurate prediction of protein-protein interactions (PPIs) is crucial for understanding cellular functions and advancing drug development. Existing in-silico methods use direct sequence embeddings from Protein Language Models (PLMs). Others use Graph Neural Networks (GNNs) for 3D protein structures. This study explores less computationally intensive alternatives. We introduce a novel framework for downstream PPI prediction through link prediction. Methods We introduce a two-stage graph representation learning framework, ProtGram-DirectGCN. First, we developed ProtGram. This approach models a protein's primary structure as a hierarchy of globally inferred n-gram graphs. In these graphs, residue transition probabilities define edge weights. Each edge connects a pair of residues in a directed graph. The probabilities are aggregated from a large corpus of sequences. Second, we propose DirectGCN, a custom directed graph convolutional neural network. This model features a unique convolutional layer. It processes information through separate path-specific transformations: incoming, outgoing, and undirected. A shared transformation is also applied. These paths are combined via a learnable gating mechanism. We apply DirectGCN to ProtGram graphs to learn residue-level embeddings. These embeddings are pooled via attention to generate protein-level embeddings for prediction. Results We first established the efficacy of DirectGCN on standard node classification benchmarks. Its performance matches established methods on general datasets. The model excels at complex, directed graphs with dense, heterophilic structures. When applied to PPI prediction, the full ProtGram-DirectGCN framework delivers robust predictive power. This strong performance holds even with limited training data.
Problem

Research questions and friction points this paper is trying to address.

Predicting protein-protein interactions using sequence-based graphs
Developing directed graph neural networks for residue transitions
Overcoming computational intensity of existing protein interaction methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Global residue transition graphs from sequences
Directed graph convolutional network with path-specific transformations
Attention-pooled residue embeddings for protein interaction prediction
🔎 Similar Papers
No similar papers found.