Task-Agnostic Contrastive Pretraining for Relational Deep Learning

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Existing relational deep learning (RDL) models predominantly rely on task-specific supervised training, limiting their scalability and representation reusability. To address this, we propose the first task-agnostic contrastive pretraining framework for relational databases. Our method employs a three-level contrastive objective—operating at the row, link, and contextual levels—to enable universal, cross-table and cross-database representation learning. We introduce a novel multi-granularity contrastive learning mechanism, integrated with a heterogeneous graph neural network architecture and an efficient relation-aware sampling strategy, to support diverse downstream tasks. Evaluated on standard RDL benchmarks, fine-tuning our pretrained model consistently outperforms from-scratch training, demonstrating its capacity to learn highly generalizable and transferable representations. This work establishes a foundation-model paradigm for RDL, significantly advancing representation reusability and adaptability across relational data applications.

Technology Category

Application Category

📝 Abstract

Relational Deep Learning (RDL) is an emerging paradigm that leverages Graph Neural Network principles to learn directly from relational databases by representing them as heterogeneous graphs. However, existing RDL models typically rely on task-specific supervised learning, requiring training separate models for each predictive task, which may hamper scalability and reuse. In this work, we propose a novel task-agnostic contrastive pretraining approach for RDL that enables database-wide representation learning. For that aim, we introduce three levels of contrastive objectives$-$row-level, link-level, and context-level$-$designed to capture the structural and semantic heterogeneity inherent to relational data. We implement the respective pretraining approach through a modular RDL architecture and an efficient sampling strategy tailored to the heterogeneous database setting. Our preliminary results on standard RDL benchmarks demonstrate that fine-tuning the pretrained models measurably outperforms training from scratch, validating the promise of the proposed methodology in learning transferable representations for relational data.

Problem

Research questions and friction points this paper is trying to address.

Develops task-agnostic pretraining for relational deep learning

Addresses structural heterogeneity in relational databases

Improves transferable representation learning for multiple tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-agnostic contrastive pretraining for RDL

Three-level contrastive objectives for heterogeneity

Modular architecture with efficient sampling strategy

🔎 Similar Papers

Disentangling and Integrating Relational and Sensory Information in Transformer Architectures