🤖 AI Summary
In real-world networks lacking node attributes, selecting an optimal link prediction algorithm critically depends on network topology, yet existing approaches lack systematic guidance for algorithm selection. Method: This paper proposes a meta-learning-based adaptive algorithm selection framework. It extracts scalable structural features—including degree distribution, triangle density, and assortativity—and establishes a unified benchmark comprising 42 topological predictors, 4 stacking strategies, 2 GNNs, and Random Forest, evaluated across 550 real-world networks. Crucially, link prediction is reformulated as a meta-learning task: “network structure → optimal algorithm,” enabling dynamic adaptation. Contribution/Results: Experiments demonstrate that the proposed method significantly outperforms state-of-the-art baselines in both AUC and Top-k metrics, especially on economic and biological networks. It exhibits strong generalizability across diverse domains and scalability to large-scale networks, offering a principled, data-driven solution for topology-aware algorithm selection.
📝 Abstract
Relational data are ubiquitous in real-world data applications, e.g., in social network analysis or biological modeling, but networks are nearly always incompletely observed. The state-of-the-art for predicting missing links in the hard case of a network without node attributes uses model stacking or neural network techniques. It remains unknown which approach is best, and whether or how the best choice of algorithm depends on the input network's characteristics. We answer these questions systematically using a large, structurally diverse benchmark of 550 real-world networks under two standard accuracy measures (AUC and Top-k), comparing four stacking algorithms with 42 topological link predictors, two of which we introduce here, and two graph neural network algorithms. We show that no algorithm is best across all input networks, all algorithms perform well on most social networks, and few perform well on economic and biological networks. Overall, model stacking with a random forest is both highly scalable and surpasses on AUC or is competitive with graph neural networks on Top-k accuracy. But, algorithm performance depends strongly on network characteristics like the degree distribution, triangle density, and degree assortativity. We introduce a meta-learning algorithm that exploits this variability to optimize link predictions for individual networks by selecting the best algorithm to apply, which we show outperforms all state-of-the-art algorithms and scales to large networks.