🤖 AI Summary
Existing benchmarks for temporal knowledge graphs suffer from data biases and oversimplified tasks, leading models to rely on co-occurrence statistics rather than genuine temporal evolution modeling, thereby hindering fair evaluation of a model’s capacity to understand knowledge dynamics. This work systematically analyzes these limitations and introduces the first benchmark specifically designed for temporal evolution modeling, comprising four debiased datasets and two novel tasks that closely align with real-world evolution mechanisms—emphasizing knowledge obsolescence and precise temporal reasoning. By redefining temporal interval representations and task formulations, the study reveals that current high Hits@10 scores often stem from dataset shortcuts rather than true reasoning capabilities. The proposed benchmark is publicly released to foster more reliable research on temporal knowledge graph evolution.
📝 Abstract
Temporal knowledge graphs (TKGs) structurally preserve evolving human knowledge. Recent research has focused on designing models to learn the evolutionary nature of TKGs to predict future facts, achieving impressive results. For instance, Hits@10 scores over 0.9 on YAGO dataset. However, we find that existing benchmarks inadvertently introduce a shortcut. Near state-of-the-art performance can be simply achieved by counting co-occurrences, without using any temporal information. In this work, we examine the root cause of this issue, identifying inherent biases in current datasets and over simplified form of evaluation task that can be exploited by these biases. Through this analysis, we further uncover additional limitations of existing benchmarks, including unreasonable formatting of time-interval knowledge, ignorance of learning knowledge obsolescence, and insufficient information for precise evolution understanding, all of which can amplify the shortcut and hinder a fair assessment. Therefore, we introduce the TKG evolution benchmark. It includes four bias-corrected datasets and two novel tasks closely aligned with the evolution process, promoting a more accurate understanding of the challenges in TKG evolution modeling. Benchmark is available at: https://github.com/zjs123/TKG-Benchmark.