T-GRAB: A Synthetic Diagnostic Benchmark for Learning on Temporal Graphs

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Existing temporal graph neural networks (TGNNs) lack systematic evaluation of their ability to model critical temporal patterns—namely periodicity, causality, and long-range dependencies. To address this gap, we introduce T-GRAB, the first synthetic benchmark explicitly designed for diagnostic evaluation of temporal reasoning in TGNNs. Leveraging controllable graph generation, T-GRAB constructs tasks with explicit temporal logic, enabling disentangled and independent assessment of these three core temporal reasoning capabilities. Comprehensive experiments across 11 state-of-the-art TGNNs reveal that most models fail significantly on long-range dependency and causal reasoning tasks, exposing fundamental limitations in their temporal awareness mechanisms. Unlike real-world datasets—where model weaknesses are hard to isolate—T-GRAB provides an interpretable, reproducible diagnostic framework. It thus bridges a critical gap in temporal modeling evaluation and lays a foundation for both rigorous assessment and principled architectural innovation in TGNNs.

Technology Category

Application Category

📝 Abstract

Dynamic graph learning methods have recently emerged as powerful tools for modelling relational data evolving through time. However, despite extensive benchmarking efforts, it remains unclear whether current Temporal Graph Neural Networks (TGNNs) effectively capture core temporal patterns such as periodicity, cause-and-effect, and long-range dependencies. In this work, we introduce the Temporal Graph Reasoning Benchmark (T-GRAB), a comprehensive set of synthetic tasks designed to systematically probe the capabilities of TGNNs to reason across time. T-GRAB provides controlled, interpretable tasks that isolate key temporal skills: counting/memorizing periodic repetitions, inferring delayed causal effects, and capturing long-range dependencies over both spatial and temporal dimensions. We evaluate 11 temporal graph learning methods on these tasks, revealing fundamental shortcomings in their ability to generalize temporal patterns. Our findings offer actionable insights into the limitations of current models, highlight challenges hidden by traditional real-world benchmarks, and motivate the development of architectures with stronger temporal reasoning abilities. The code for T-GRAB can be found at: https://github.com/alirezadizaji/T-GRAB.

Problem

Research questions and friction points this paper is trying to address.

Evaluating TGNNs' ability to capture temporal patterns like periodicity and causality

Assessing TGNNs' performance on long-range dependencies in temporal graphs

Identifying limitations of current models in generalizing temporal patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces T-GRAB benchmark for temporal graphs

Evaluates 11 methods on key temporal skills

Reveals shortcomings in current TGNN models

🔎 Similar Papers

From Link Prediction to Forecasting: Addressing Challenges in Batch-based Temporal Graph Learning