REDELEX: A Framework for Relational Deep Learning Exploration

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research lacks a systematic analysis of the relationship between relational deep learning (RDL) model performance and intrinsic characteristics of relational databases (RDBs). Method: We introduce an open-source evaluation framework covering 70+ real-world databases, modeling RDBs uniformly as heterogeneous graphs and integrating diverse graph neural network (GNN) architectures. We benchmark RDL models against traditional SQL query optimizers and embedding-based baselines across multiple query tasks. Contribution/Results: Our study is the first to systematically investigate how model complexity, data scale, and structural properties—such as cardinality, normalization level, and foreign-key density—affect RDL efficacy. Experiments demonstrate RDL’s consistent superiority across query types and identify key database features governing performance, providing empirical guidance for model selection and deployment. We release the first large-scale RDB-GNN benchmark dataset and a fully reproducible evaluation pipeline.

Technology Category

Application Category

📝 Abstract
Relational databases (RDBs) are widely regarded as the gold standard for storing structured information. Consequently, predictive tasks leveraging this data format hold significant application promise. Recently, Relational Deep Learning (RDL) has emerged as a novel paradigm wherein RDBs are conceptualized as graph structures, enabling the application of various graph neural architectures to effectively address these tasks. However, given its novelty, there is a lack of analysis into the relationships between the performance of various RDL models and the characteristics of the underlying RDBs. In this study, we present REDELEX$-$a comprehensive exploration framework for evaluating RDL models of varying complexity on the most diverse collection of over 70 RDBs, which we make available to the community. Benchmarked alongside key representatives of classic methods, we confirm the generally superior performance of RDL while providing insights into the main factors shaping performance, including model complexity, database sizes and their structural properties.
Problem

Research questions and friction points this paper is trying to address.

Analyzes RDL model performance vs. RDB characteristics
Evaluates RDL models on diverse 70+ RDB datasets
Identifies key factors affecting RDL performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework for evaluating diverse RDL models
Graph neural architectures on relational databases
Analysis of performance factors in RDL