π€ AI Summary
Existing evaluation protocols for temporal knowledge graph reasoning treat all events equally, overlooking the fact that most events are mundane repetitions and thus fail to assess a modelβs capacity for deep reasoning on rare, salient events. This work introduces the notion of event salience and proposes a Rule-based Salience Measurement Framework (RSMF), which leverages temporal rules to identify event clusters and quantify their non-triviality, thereby enabling salience-weighted evaluation metrics such as weighted MRR and Hits@k. Experiments across four benchmark datasets reveal that state-of-the-art models perform significantly worse on high-salience events; path-based methods excel at modeling trivial events, whereas representation learning approaches demonstrate superior performance on salient ones. Moreover, performance gains from ensemble strategies primarily stem from better fitting of mundane events rather than enhanced reasoning capabilities.
π Abstract
Temporal Knowledge Graph Reasoning (TKGR) aims at inferring missing (especially future) events from historical data. Current evaluation in TKGR uniformly weights all events, ignoring that most are trivial repetitions, which overestimate the true reasoning ability. Therefore, the rare outstanding events, whose prediction demands deeper reasoning, should be distinguished and emphasized. To this end, we propose a strikingness-aware evaluation framework, which introduces a rule-based strikingness measuring framework (RSMF) to quantify event strikingness by comparing its expected occurrence with peer events derived from temporal rules. Strikingness is then integrated as a weighting factor into metrics like weighted MRR and Hits@k. Experiments on four TKG benchmarks reveal: 1) All representative models perform worse as event strikingness increases, 2) Path-based methods excel on low-strikingness events and representation-based ones on high-strikingness events, 3) We design an ensemble method whose gains stem from fitting trivial events rather than reasoning improvement. Our framework provides a more rigorous evaluation, refocusing the field on predicting outstanding events.