🤖 AI Summary
Graph Neural Networks (GNNs) struggle to capture complex structural patterns, long-range dependencies, and temporal dynamics in multi-table relational data; moreover, existing positional encoding and tokenization schemes suffer from poor generalizability and loss of topological information. Method: We propose the first Graph Transformer architecture tailored for relational entity graphs. It introduces a novel five-element node tokenization strategy—incorporating features, entity types, hop distances, timestamps, and local structural contexts—and unifies heterogeneous, temporal, and topologically faithful modeling via hybrid attention: subgraph-local attention coupled with learnable global centroid attention. Contribution/Results: Our method achieves significant improvements over GNN baselines across all 21 tasks in RelBench, with up to 18% absolute gain, thereby establishing the Graph Transformer as both state-of-the-art and practically viable for relational deep learning.
📝 Abstract
Relational Deep Learning (RDL) is a promising approach for building state-of-the-art predictive models on multi-table relational data by representing it as a heterogeneous temporal graph. However, commonly used Graph Neural Network models suffer from fundamental limitations in capturing complex structural patterns and long-range dependencies that are inherent in relational data. While Graph Transformers have emerged as powerful alternatives to GNNs on general graphs, applying them to relational entity graphs presents unique challenges: (i) Traditional positional encodings fail to generalize to massive, heterogeneous graphs; (ii) existing architectures cannot model the temporal dynamics and schema constraints of relational data; (iii) existing tokenization schemes lose critical structural information. Here we introduce the Relational Graph Transformer (RelGT), the first graph transformer architecture designed specifically for relational tables. RelGT employs a novel multi-element tokenization strategy that decomposes each node into five components (features, type, hop distance, time, and local structure), enabling efficient encoding of heterogeneity, temporality, and topology without expensive precomputation. Our architecture combines local attention over sampled subgraphs with global attention to learnable centroids, incorporating both local and database-wide representations. Across 21 tasks from the RelBench benchmark, RelGT consistently matches or outperforms GNN baselines by up to 18%, establishing Graph Transformers as a powerful architecture for Relational Deep Learning.