🤖 AI Summary
Existing relational deep learning (RDL) models predominantly rely on task-specific supervised training, limiting their scalability and representation reusability. To address this, we propose the first task-agnostic contrastive pretraining framework for relational databases. Our method employs a three-level contrastive objective—operating at the row, link, and contextual levels—to enable universal, cross-table and cross-database representation learning. We introduce a novel multi-granularity contrastive learning mechanism, integrated with a heterogeneous graph neural network architecture and an efficient relation-aware sampling strategy, to support diverse downstream tasks. Evaluated on standard RDL benchmarks, fine-tuning our pretrained model consistently outperforms from-scratch training, demonstrating its capacity to learn highly generalizable and transferable representations. This work establishes a foundation-model paradigm for RDL, significantly advancing representation reusability and adaptability across relational data applications.
📝 Abstract
Relational Deep Learning (RDL) is an emerging paradigm that leverages Graph Neural Network principles to learn directly from relational databases by representing them as heterogeneous graphs. However, existing RDL models typically rely on task-specific supervised learning, requiring training separate models for each predictive task, which may hamper scalability and reuse.
In this work, we propose a novel task-agnostic contrastive pretraining approach for RDL that enables database-wide representation learning. For that aim, we introduce three levels of contrastive objectives$-$row-level, link-level, and context-level$-$designed to capture the structural and semantic heterogeneity inherent to relational data. We implement the respective pretraining approach through a modular RDL architecture and an efficient sampling strategy tailored to the heterogeneous database setting. Our preliminary results on standard RDL benchmarks demonstrate that fine-tuning the pretrained models measurably outperforms training from scratch, validating the promise of the proposed methodology in learning transferable representations for relational data.