Relatron: Automating Relational Machine Learning over Relational Databases

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges in predictive modeling over relational databases, where cross-table dependencies and feature interactions are difficult to model effectively, and there is a lack of systematic understanding of performance differences between relational deep learning (RDL) and traditional methods like Deep Feature Synthesis (DFS), as well as task-adaptive selection mechanisms. To bridge this gap, the authors propose a unified design space encompassing both RDL and DFS, construct a performance repository via architecture-centric search, and introduce two task-aware signals—task homogeneity and affinity embedding—to explain performance disparities. Building on these insights, they develop Relatron, a lightweight meta-selector enhanced with loss landscape flatness for improved robustness. Experiments demonstrate that Relatron achieves up to an 18.5% performance gain over strong baselines in joint hyperparameter and architecture optimization, at a computational cost ten times lower than Fisher information–based approaches, effectively mitigating the “more tuning, worse performance” phenomenon.

Technology Category

Application Category

📝 Abstract
Predictive modeling over relational databases (RDBs) powers applications, yet remains challenging due to capturing both cross-table dependencies and complex feature interactions. Relational Deep Learning (RDL) methods automate feature engineering via message passing, while classical approaches like Deep Feature Synthesis (DFS) rely on predefined non-parametric aggregators. Despite performance gains, the comparative advantages of RDL over DFS and the design principles for selecting effective architectures remain poorly understood. We present a comprehensive study that unifies RDL and DFS in a shared design space and conducts architecture-centric searches across diverse RDB tasks. Our analysis yields three key findings: (1) RDL does not consistently outperform DFS, with performance being highly task-dependent; (2) no single architecture dominates across tasks, underscoring the need for task-aware model selection; and (3) validation accuracy is an unreliable guide for architecture choice. This search yields a model performance bank that links architecture configurations to their performance; leveraging this bank, we analyze the drivers of the RDL-DFS performance gap and introduce two task signals -- RDB task homophily and an affinity embedding that captures size, path, feature, and temporal structure -- whose correlation with the gap enables principled routing. Guided by these signals, we propose Relatron, a task embedding-based meta-selector that chooses between RDL and DFS and prunes the within-family search. Lightweight loss-landscape metrics further guard against brittle checkpoints by preferring flatter optima. In experiments, Relatron resolves the "more tuning, worse performance" effect and, in joint hyperparameter-architecture optimization, achieves up to 18.5% improvement over strong baselines with 10x lower cost than Fisher information-based alternatives.
Problem

Research questions and friction points this paper is trying to address.

Relational Machine Learning
Relational Databases
Feature Engineering
Model Selection
Architecture Design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Relational Deep Learning
Deep Feature Synthesis
Task Embedding
Architecture Selection
Meta-learning
🔎 Similar Papers
No similar papers found.