🤖 AI Summary
This paper addresses the link prediction task of future collaborator recommendation in academic social networks. To overcome the limitations of existing approaches—which heavily rely on topological structures while neglecting semantic and performance-related author attributes—we propose a supervised method integrating multidimensional node features. Specifically, we systematically introduce four complementary node-level features: (1) research interest similarity, derived from topic embeddings; (2) institutional affiliation similarity, computed via hierarchical institutional encoding; and (3–4) two aggregated research performance metrics, including h-index and publication count. We evaluate our approach using XGBoost and Random Forest classifiers on the ArnetMiner and DBLP datasets. Experimental results demonstrate statistically significant improvements in accuracy and AUC over baseline methods, confirming the effectiveness and generalizability of incorporating semantic and performance features for large-scale academic link prediction.
📝 Abstract
Predicting the emergence of future research collaborations between authors in academic social networks (SNs) is a very effective example that demonstrates the link prediction problem. This problem refers to predicting the potential existence or absence of a link between a pair of nodes (authors) on the co-authorship network. Various similarity and aggregation metrics were proposed in the literature for predicting the potential link between two authors on such networks. However, the relevant research did not investigate the impact of similarity of research interests of two authors or the similarity of their affiliations on the performance of predicting the potential link between them. Additionally, the impact of the aggregation of the research performance indices of two authors on link prediction performance was not highlighted. To this end, in this paper we propose an integrative supervised learning framework for predicting potential collaboration in co-authorship network based on similarity of the research interests and the similarity of the affiliations of each pair of authors in this network. Moreover, our proposed framework integrates the aggregation of research performance indices of each author pair and the similarity between the two authors nodes with the research interest and affiliation similarity as four metrics for predicting the potential link between each two authors. Our experimental results obtained from applying our proposed link prediction approach to the two largest connected graphs of two huge academic co-authorship networks, namely ArnetMiner and DBLP, show the great performance of this approach in predicting potential links between two authors on large-scale academic SNs.