๐ค AI Summary
End-to-end representation learning for multi-table relational databases remains challenging due to the need for manual feature engineering and the lack of unified structural abstractions across heterogeneous, time-evolving tables.
Method: We propose Relational Deep Learning (RDL), a framework that bypasses traditional feature engineering by formally introducing the *temporal heterogeneous relational entity graph*โa unified graph structure where primaryโforeign key relationships define edges, schema constraints govern node/edge types, and timestamps serve as dynamic attributes. RDL integrates graph neural networks (GNNs), relational algebraic modeling, temporal graph learning, and heterogeneous graph architectures to enable cross-table joint representation learning.
Contribution: This work establishes the first theoretical foundation and technical roadmap for RDL, systematically identifying core challenges and curating benchmark datasets. It advances graph representation learning toward relational data foundation models and introduces a novel paradigm for large-scale, multi-table joint modeling.
๐ Abstract
Graph machine learning has led to a significant increase in the capabilities of models that learn on arbitrary graph-structured data and has been applied to molecules, social networks, recommendation systems, and transportation, among other domains. Data in multi-tabular relational databases can also be constructed as'relational entity graphs'for Relational Deep Learning (RDL) - a new blueprint that enables end-to-end representation learning without traditional feature engineering. Compared to arbitrary graph-structured data, relational entity graphs have key properties: (i) their structure is defined by primary-foreign key relationships between entities in different tables, (ii) the structural connectivity is a function of the relational schema defining a database, and (iii) the graph connectivity is temporal and heterogeneous in nature. In this paper, we provide a comprehensive review of RDL by first introducing the representation of relational databases as relational entity graphs, and then reviewing public benchmark datasets that have been used to develop and evaluate recent GNN-based RDL models. We discuss key challenges including large-scale multi-table integration and the complexities of modeling temporal dynamics and heterogeneous data, while also surveying foundational neural network methods and recent architectural advances specialized for relational entity graphs. Finally, we explore opportunities to unify these distinct modeling challenges, highlighting how RDL converges multiple sub-fields in graph machine learning towards the design of foundation models that can transform the processing of relational data.