How Sharp and Bias-Robust is a Model? Dual Evaluation Perspectives on Knowledge Graph Completion

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing knowledge graph completion (KGC) evaluation overlooks two critical dimensions: prediction sharpness—the strictness of individual predictions—and robustness to popularity bias—the model’s generalization capability over low-popularity entities. This paper introduces PROBE, the first unified evaluation framework that jointly models both aspects. PROBE employs a Rank Transformer (RT) to dynamically calibrate prediction score strictness and a Popularity-aware Rank Aggregator (RA) to enable fine-grained, fairness-aware score aggregation. Experiments across multiple real-world datasets demonstrate that PROBE effectively mitigates performance overestimation or underestimation induced by popularity bias in conventional metrics (e.g., MRR, Hits@k). It significantly enhances evaluation reliability and model ranking stability, establishing a more scientific, interpretable, and multidimensional benchmark for KGC assessment.

Technology Category

Application Category

📝 Abstract
Knowledge graph completion (KGC) aims to predict missing facts from the observed KG. While a number of KGC models have been studied, the evaluation of KGC still remain underexplored. In this paper, we observe that existing metrics overlook two key perspectives for KGC evaluation: (A1) predictive sharpness -- the degree of strictness in evaluating an individual prediction, and (A2) popularity-bias robustness -- the ability to predict low-popularity entities. Toward reflecting both perspectives, we propose a novel evaluation framework (PROBE), which consists of a rank transformer (RT) estimating the score of each prediction based on a required level of predictive sharpness and a rank aggregator (RA) aggregating all the scores in a popularity-aware manner. Experiments on real-world KGs reveal that existing metrics tend to over- or under-estimate the accuracy of KGC models, whereas PROBE yields a comprehensive understanding of KGC models and reliable evaluation results.
Problem

Research questions and friction points this paper is trying to address.

Evaluates predictive sharpness and bias robustness in knowledge graph completion.
Proposes PROBE framework for comprehensive model assessment.
Addresses over- or under-estimation issues in existing evaluation metrics.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rank transformer adjusts prediction sharpness levels
Rank aggregator incorporates popularity-aware scoring
PROBE framework evaluates both sharpness and bias robustness
🔎 Similar Papers
No similar papers found.