🤖 AI Summary
Graph neural network (GNN)-based recommender systems suffer from unreliable and overconfident predictions under noise accumulation and data sparsity. To address this, we propose the first prediction confidence quantification and calibration framework specifically designed for GNN recommenders. Our method introduces three key innovations: (1) a dynamic score calibration mechanism that incorporates user-specific bias modeling; (2) a confidence-aware loss function tailored for negative sampling, explicitly penalizing overconfidence; and (3) robust message propagation modeling to stabilize feature aggregation under noisy conditions. Extensive experiments on multiple public benchmark datasets demonstrate significant improvements: average Recall@20 increases by 3.2%, while Expected Calibration Error (ECE) decreases by 41.7%, confirming enhanced accuracy and calibration reliability—particularly in high-noise scenarios. The framework thus improves both predictive trustworthiness and robustness of GNN-based recommendation.
📝 Abstract
Recommender systems based on graph neural networks perform well in tasks such as rating and ranking. However, in real-world recommendation scenarios, noise such as user misuse and malicious advertisement gradually accumulates through the message propagation mechanism. Even if existing studies mitigate their effects by reducing the noise propagation weights, the severe sparsity of the recommender system still leads to the low-weighted noisy neighbors being mistaken as meaningful information, and the prediction result obtained based on the polluted nodes is not entirely trustworthy. Therefore, it is crucial to measure the confidence of the prediction results in this highly noisy framework. Furthermore, our evaluation of the existing representative GNN-based recommendation shows that it suffers from overconfidence. Based on the above considerations, we propose a new method to quantify and calibrate the prediction confidence of GNN-based recommendations (Conf-GNNRec). Specifically, we propose a rating calibration method that dynamically adjusts excessive ratings to mitigate overconfidence based on user personalization. We also design a confidence loss function to reduce the overconfidence of negative samples and effectively improve recommendation performance. Experiments on public datasets demonstrate the validity of Conf-GNNRec in prediction confidence and recommendation performance.