Strikingness-Aware Evaluation for Temporal Knowledge Graph Reasoning

πŸ“… 2026-05-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

166K/year
πŸ€– AI Summary
Existing evaluation protocols for temporal knowledge graph reasoning treat all events equally, overlooking the fact that most events are mundane repetitions and thus fail to assess a model’s capacity for deep reasoning on rare, salient events. This work introduces the notion of event salience and proposes a Rule-based Salience Measurement Framework (RSMF), which leverages temporal rules to identify event clusters and quantify their non-triviality, thereby enabling salience-weighted evaluation metrics such as weighted MRR and Hits@k. Experiments across four benchmark datasets reveal that state-of-the-art models perform significantly worse on high-salience events; path-based methods excel at modeling trivial events, whereas representation learning approaches demonstrate superior performance on salient ones. Moreover, performance gains from ensemble strategies primarily stem from better fitting of mundane events rather than enhanced reasoning capabilities.
πŸ“ Abstract
Temporal Knowledge Graph Reasoning (TKGR) aims at inferring missing (especially future) events from historical data. Current evaluation in TKGR uniformly weights all events, ignoring that most are trivial repetitions, which overestimate the true reasoning ability. Therefore, the rare outstanding events, whose prediction demands deeper reasoning, should be distinguished and emphasized. To this end, we propose a strikingness-aware evaluation framework, which introduces a rule-based strikingness measuring framework (RSMF) to quantify event strikingness by comparing its expected occurrence with peer events derived from temporal rules. Strikingness is then integrated as a weighting factor into metrics like weighted MRR and Hits@k. Experiments on four TKG benchmarks reveal: 1) All representative models perform worse as event strikingness increases, 2) Path-based methods excel on low-strikingness events and representation-based ones on high-strikingness events, 3) We design an ensemble method whose gains stem from fitting trivial events rather than reasoning improvement. Our framework provides a more rigorous evaluation, refocusing the field on predicting outstanding events.
Problem

Research questions and friction points this paper is trying to address.

Temporal Knowledge Graph Reasoning
Evaluation Framework
Event Strikingness
Reasoning Ability
Outstanding Events
Innovation

Methods, ideas, or system contributions that make the work stand out.

strikingness-aware evaluation
temporal knowledge graph reasoning
rule-based strikingness measuring framework
weighted evaluation metrics
outstanding event prediction
πŸ”Ž Similar Papers