🤖 AI Summary
This paper addresses the challenging problem of early citation prediction for newly published papers—characterized by sparse initial citation signals and a highly long-tailed citation distribution—leading to low prediction accuracy and severe bias against low-citation papers. To tackle this, we propose a bias-aware multi-agent graph collaborative learning framework. Methodologically, we introduce a fine-grained scientific impact factor to model latent influence mechanisms, design a two-stage forward propagation architecture, and integrate heterogeneous network embedding, GroupDRO-based robust optimization, and a causal regularization head to jointly achieve debiased, interpretable, and robust modeling. Evaluated on real-world datasets, our approach reduces MAE and RMSLE by approximately 13% and improves NDCG by 5.5% over state-of-the-art baselines. Notably, it significantly enhances fairness and stability in predicting citations for long-tailed, low-citation papers.
📝 Abstract
As a key to accessing research impact, citation dynamics underpins research evaluation, scholarly recommendation, and the study of knowledge diffusion. Citation prediction is particularly critical for newborn papers, where early assessment must be performed without citation signals and under highly long-tailed distributions. We identify two key research gaps: (i) insufficient modeling of implicit factors of scientific impact, leading to reliance on coarse proxies; and (ii) a lack of bias-aware learning that can deliver stable predictions on lowly cited papers. We address these gaps by proposing a Bias-Aware Citation Prediction Framework, which combines multi-agent feature extraction with robust graph representation learning. First, a multi-agent x graph co-learning module derives fine-grained, interpretable signals, such as reproducibility, collaboration network, and text quality, from metadata and external resources, and fuses them with heterogeneous-network embeddings to provide rich supervision even in the absence of early citation signals. Second, we incorporate a set of robust mechanisms: a two-stage forward process that routes explicit factors through an intermediate exposure estimate, GroupDRO to optimize worst-case group risk across environments, and a regularization head that performs what-if analyses on controllable factors under monotonicity and smoothness constraints. Comprehensive experiments on two real-world datasets demonstrate the effectiveness of our proposed model. Specifically, our model achieves around a 13% reduction in error metrics (MALE and RMSLE) and a notable 5.5% improvement in the ranking metric (NDCG) over the baseline methods.