Supervised Link Prediction in Co-Authorship Networks Based on Author Node-Based Features

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the link prediction task of future collaborator recommendation in academic social networks. To overcome the limitations of existing approaches—which heavily rely on topological structures while neglecting semantic and performance-related author attributes—we propose a supervised method integrating multidimensional node features. Specifically, we systematically introduce four complementary node-level features: (1) research interest similarity, derived from topic embeddings; (2) institutional affiliation similarity, computed via hierarchical institutional encoding; and (3–4) two aggregated research performance metrics, including h-index and publication count. We evaluate our approach using XGBoost and Random Forest classifiers on the ArnetMiner and DBLP datasets. Experimental results demonstrate statistically significant improvements in accuracy and AUC over baseline methods, confirming the effectiveness and generalizability of incorporating semantic and performance features for large-scale academic link prediction.

Technology Category

Application Category

📝 Abstract
Predicting the emergence of future research collaborations between authors in academic social networks (SNs) is a very effective example that demonstrates the link prediction problem. This problem refers to predicting the potential existence or absence of a link between a pair of nodes (authors) on the co-authorship network. Various similarity and aggregation metrics were proposed in the literature for predicting the potential link between two authors on such networks. However, the relevant research did not investigate the impact of similarity of research interests of two authors or the similarity of their affiliations on the performance of predicting the potential link between them. Additionally, the impact of the aggregation of the research performance indices of two authors on link prediction performance was not highlighted. To this end, in this paper we propose an integrative supervised learning framework for predicting potential collaboration in co-authorship network based on similarity of the research interests and the similarity of the affiliations of each pair of authors in this network. Moreover, our proposed framework integrates the aggregation of research performance indices of each author pair and the similarity between the two authors nodes with the research interest and affiliation similarity as four metrics for predicting the potential link between each two authors. Our experimental results obtained from applying our proposed link prediction approach to the two largest connected graphs of two huge academic co-authorship networks, namely ArnetMiner and DBLP, show the great performance of this approach in predicting potential links between two authors on large-scale academic SNs.
Problem

Research questions and friction points this paper is trying to address.

Predict future research collaborations in co-authorship networks
Assess impact of research interest and affiliation similarity on link prediction
Integrate research performance indices and node similarity for improved prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Supervised learning for co-authorship link prediction
Integrates research interest and affiliation similarity
Aggregates research performance indices as metrics
🔎 Similar Papers
No similar papers found.
D
Doaa Hassan
Luddy School of Informatics, Computing and Engineering, Indiana University - Indianapolis, United States; Computers and Systems Department, National Telecommunication Institute, Cairo, Egypt
Mohammad Al Hasan
Mohammad Al Hasan
Professor of Computer Science, Indiana University Indianapolis, USA
Data MiningNetwork AnalysisGraph Machine Learning