🤖 AI Summary
This paper addresses the limited interpretability of graph link prediction (LP). We propose PHLP, the first GNN-free, purely persistent homology (PH)-based interpretable method for LP. PHLP quantifies how the presence or absence of a target edge affects global topological structures—such as connected components and cycles—to extract discriminative topological features. Key contributions include: (1) the first direct application of PH to LP; (2) a novel angular jump subgraph sampling strategy coupled with Degree-DRNL node labeling, enhancing local topological discrimination; and (3) an end-to-end interpretable, lightweight, black-box-free prediction framework. Experiments show that PHLP matches or approaches state-of-the-art GNNs on most benchmark datasets. Moreover, transferring PHLP’s topological features to existing models consistently improves performance across all datasets, empirically validating that topological structural changes constitute a fundamental discriminative signal for LP.
📝 Abstract
Link prediction (LP), inferring the connectivity between nodes, is a significant research area in graph data, where a link represents essential information on relationships between nodes. Although graph neural network (GNN)-based models have achieved high performance in LP, understanding why they perform well is challenging because most comprise complex neural networks. We employ persistent homology (PH), a topological data analysis method that helps analyze the topological information of graphs, to interpret the features used for prediction. We propose a novel method that employs PH for LP (PHLP) focusing on how the presence or absence of target links influences the overall topology. The PHLP utilizes the angle hop subgraph and new node labeling called degree double radius node labeling (Degree DRNL), distinguishing the information of graphs better than DRNL. Using only a classifier, PHLP performs similarly to state-of-the-art (SOTA) models on most benchmark datasets. Incorporating the outputs calculated using PHLP into the existing GNN-based SOTA models improves performance across all benchmark datasets. To the best of our knowledge, PHLP is the first method of applying PH to LP without GNNs. The proposed approach, employing PH while not relying on neural networks, enables the identification of crucial factors for improving performance.