RelPrism: A Multi-Faceted Pre-training Framework with Self-Generated Tasks for Relational Databases

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Existing self-supervised pretraining methods for relational databases are often confined to a single perspective, limiting their ability to meet the downstream demand for multi-view and multi-granularity information. This work proposes RelPrism, a novel framework that introduces, for the first time in relational database pretraining, a self-generated pseudo-task mechanism integrating multiple views—intrinsic, relational, and hybrid attributes—with multi-granularity clustering to enable multifaceted self-supervised learning. The proposed approach substantially enhances model generalization, achieving an average 4.15% improvement in ROC-AUC on classification tasks and a 10.75% reduction in MAE on regression tasks across 14 benchmarks spanning five real-world datasets, consistently outperforming current state-of-the-art baselines.

📝 Abstract

Relational databases (RDBs) remain the cornerstone of modern data systems and support diverse predictive tasks. Recent relational deep learning (RDL) methods enable end-to-end prediction by converting RDBs into graphs, where rows are represented as nodes and inter-table interactions are represented as edges, and then applying graph-based models for representation learning. Despite the strong capability of RDL, effective self-supervised pre-training for RDBs remains non-trivial. RDB tasks often require multi-faceted information across different perspectives and granularities. For example, user churn classification may rely more on interaction patterns, whereas consumption value prediction requires both user-item behaviors and intrinsic user attributes for fine-grained regression. Such heterogeneous needs challenge RDB representation learning, as pre-training objectives should cover comprehensive information for downstream adaptation. However, existing SSL methods typically derive supervision from a single facet, such as node-level intrinsic attributes or subgraph-level relational structures, providing limited adaptability. To this end, we propose RelPrism, a multi-faceted self-supervised learning framework for RDBs. RelPrism constructs intrinsic, relational, and hybrid attributes from distinct perspectives, and applies multi-granularity clustering to each perspective to form corresponding pseudo-task pools. Pre-training over these pools exposes representations to broader perspectives and granularity levels, yielding a stronger basis for downstream adaptation. Experiments on 14 tasks across 5 real-world datasets show that RelPrism improves ROC-AUC by 4.15% for classification and reduces MAE by 10.75% for regression over state-of-the-art baselines. Our code is available at https://anonymous.4open.science/r/RelPrism.

Problem

Research questions and friction points this paper is trying to address.

relational databases

self-supervised learning

pre-training

multi-faceted representation

heterogeneous tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-faceted pre-training

self-supervised learning

relational databases