Griffin: Towards a Graph-Centric Relational Database Foundation Model

πŸ“… 2025-05-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing relational database (RDB) models struggle to jointly address diverse downstream tasks. To address this limitation, we propose Griffinβ€”the first graph-centric foundational model for RDBs. Griffin introduces a unified architecture comprising a shared data encoder and task-specific decoders, integrating heterogeneous feature encoding, enhanced message-passing neural networks (MPNNs), cross-attention mechanisms, and a novel graph aggregator. It supports both single-table and multi-table joint pretraining on large-scale real-world RDB graphs containing over 150 million nodes. Empirical evaluation shows Griffin matches or surpasses state-of-the-art (SOTA) single-task models across multiple benchmarks. Moreover, it achieves substantial accuracy gains in few-shot settings and demonstrates strong cross-dataset and cross-task transferability, significantly improving generalization under low-resource conditions.

Technology Category

Application Category

πŸ“ Abstract
We introduce Griffin, the first foundation model attemptation designed specifically for Relational Databases (RDBs). Unlike previous smaller models focused on single RDB tasks, Griffin unifies the data encoder and task decoder to handle diverse tasks. Additionally, we enhance the architecture by incorporating a cross-attention module and a novel aggregator. Griffin utilizes pretraining on both single-table and RDB datasets, employing advanced encoders for categorical, numerical, and metadata features, along with innovative components such as cross-attention modules and enhanced message-passing neural networks (MPNNs) to capture the complexities of relational data. Evaluated on large-scale, heterogeneous, and temporal graphs extracted from RDBs across various domains (spanning over 150 million nodes), Griffin demonstrates superior or comparable performance to individually trained models, excels in low-data scenarios, and shows strong transferability with similarity and diversity in pretraining across new datasets and tasks, highlighting its potential as a universally applicable foundation model for RDBs. Code available at https://github.com/yanxwb/Griffin.
Problem

Research questions and friction points this paper is trying to address.

Develops first foundation model for relational databases
Unifies data encoder and task decoder for diverse tasks
Enhances architecture with cross-attention and novel aggregator
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified data encoder and task decoder
Cross-attention module and novel aggregator
Pretraining with advanced encoders and MPNNs
πŸ”Ž Similar Papers
No similar papers found.
Y
Yanbo Wang
Institute for Artificial Intelligence, Peking University
X
Xiyuan Wang
Institute for Artificial Intelligence, Peking University
Q
Quan Gan
Amazon Web Services
M
Minjie Wang
Amazon Web Services
Q
Qibin Yang
Institute for Artificial Intelligence, Peking University
David Wipf
David Wipf
Principal Research Scientist, Amazon Web Services
deep generative modelssparse representationsBayesian inferencegraph neural networks
Muhan Zhang
Muhan Zhang
Peking University
Machine LearningGraph Neural NetworkLarge Language Models