🤖 AI Summary
Existing temporal graph neural networks (TGNNs) lack systematic evaluation of their ability to model critical temporal patterns—namely periodicity, causality, and long-range dependencies. To address this gap, we introduce T-GRAB, the first synthetic benchmark explicitly designed for diagnostic evaluation of temporal reasoning in TGNNs. Leveraging controllable graph generation, T-GRAB constructs tasks with explicit temporal logic, enabling disentangled and independent assessment of these three core temporal reasoning capabilities. Comprehensive experiments across 11 state-of-the-art TGNNs reveal that most models fail significantly on long-range dependency and causal reasoning tasks, exposing fundamental limitations in their temporal awareness mechanisms. Unlike real-world datasets—where model weaknesses are hard to isolate—T-GRAB provides an interpretable, reproducible diagnostic framework. It thus bridges a critical gap in temporal modeling evaluation and lays a foundation for both rigorous assessment and principled architectural innovation in TGNNs.
📝 Abstract
Dynamic graph learning methods have recently emerged as powerful tools for modelling relational data evolving through time. However, despite extensive benchmarking efforts, it remains unclear whether current Temporal Graph Neural Networks (TGNNs) effectively capture core temporal patterns such as periodicity, cause-and-effect, and long-range dependencies. In this work, we introduce the Temporal Graph Reasoning Benchmark (T-GRAB), a comprehensive set of synthetic tasks designed to systematically probe the capabilities of TGNNs to reason across time. T-GRAB provides controlled, interpretable tasks that isolate key temporal skills: counting/memorizing periodic repetitions, inferring delayed causal effects, and capturing long-range dependencies over both spatial and temporal dimensions. We evaluate 11 temporal graph learning methods on these tasks, revealing fundamental shortcomings in their ability to generalize temporal patterns. Our findings offer actionable insights into the limitations of current models, highlight challenges hidden by traditional real-world benchmarks, and motivate the development of architectures with stronger temporal reasoning abilities. The code for T-GRAB can be found at: https://github.com/alirezadizaji/T-GRAB.