🤖 AI Summary
Existing heterogeneous graph neural networks (HGNNs) struggle to effectively model high-order structural dependencies in relational databases—such as table-column-value ternary relationships—leading to suboptimal predictive performance. To address this, we propose RelGNN, a novel heterogeneous GNN framework tailored for prediction tasks over relational databases. Its core innovation is the first abstraction of relational schemas into *atomic paths*, enabling a *composite message-passing mechanism* that supports direct, single-hop interactions among heterogeneous nodes—thereby avoiding information redundancy and entanglement inherent in conventional multi-hop aggregation. RelGNN explicitly captures fine-grained semantic associations among tables, columns, and values, overcoming a fundamental limitation of mainstream HGNNs in relational schema modeling. Evaluated on 30 real-world tasks from RelBench, RelGNN achieves state-of-the-art performance across all benchmarks, with up to a 25% absolute improvement in accuracy.
📝 Abstract
Predictive tasks on relational databases are critical in real-world applications spanning e-commerce, healthcare, and social media. To address these tasks effectively, Relational Deep Learning (RDL) encodes relational data as graphs, enabling Graph Neural Networks (GNNs) to exploit relational structures for improved predictions. However, existing heterogeneous GNNs often overlook the intrinsic structural properties of relational databases, leading to modeling inefficiencies. Here we introduce RelGNN, a novel GNN framework specifically designed to capture the unique characteristics of relational databases. At the core of our approach is the introduction of atomic routes, which are sequences of nodes forming high-order tripartite structures. Building upon these atomic routes, RelGNN designs new composite message passing mechanisms between heterogeneous nodes, allowing direct single-hop interactions between them. This approach avoids redundant aggregations and mitigates information entanglement, ultimately leading to more efficient and accurate predictive modeling. RelGNN is evaluated on 30 diverse real-world tasks from RelBench (Fey et al., 2024), and consistently achieves state-of-the-art accuracy with up to 25% improvement.