🤖 AI Summary
In heterogeneous graphs, large feature disparities among neighboring nodes degrade the performance of conventional GNNs due to their reliance on local aggregation; meanwhile, existing long-range or global aggregation methods require iterative full-graph updates, limiting scalability to large-scale graphs.
Method: We propose a one-shot global aggregation mechanism grounded in SimRank structural similarity—the first integration of SimRank into heterogeneous GNNs—enabling direct modeling of long-distance node similarities without iteration. We theoretically prove its expressiveness and linear time complexity O(n). Our approach comprises SimRank-based similarity modeling, global neighbor-weighted aggregation, and a heterogeneous-GNN architecture tailored to node/edge-type heterogeneity.
Contribution/Results: On the large-scale Pokec dataset, our method achieves a 5× speedup over baselines while attaining state-of-the-art accuracy for heterogeneous graph learning, significantly improving both training efficiency and scalability.
📝 Abstract
Graph neural networks (GNNs) realize great success in graph learning but suffer from performance loss when meeting heterophily, i.e. neighboring nodes are dissimilar, due to their local and uniform aggregation. Existing attempts of heterophilous GNNs incorporate long-range or global aggregations to distinguish nodes in the graph. However, these aggregations usually require iteratively maintaining and updating full-graph information, which limits their efficiency when applying to large-scale graphs. In this paper, we propose SIGMA, an efficient global heterophilous GNN aggregation integrating the structural similarity measurement SimRank. Our theoretical analysis illustrates that SIGMA inherently captures distant global similarity even under heterophily, that conventional approaches can only achieve after iterative aggregations. Furthermore, it enjoys efficient one-time computation with a complexity only linear to the node set size $mathcal{O}(n)$. Comprehensive evaluation demonstrates that SIGMA achieves state-of-the-art performance with superior aggregation and overall efficiency. Notably, it obtains $5 imes$ acceleration on the large-scale heterophily dataset pokec with over 30 million edges compared to the best baseline aggregation.