RelDiff: Relational Data Generative Modeling with Graph-Based Diffusion Models

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing relational data generation methods struggle to model complex structural and statistical dependencies across multiple tables, often resorting to flattened conditional generation with strong structural assumptions. This work introduces the first diffusion-based generative framework that explicitly embeds graph-structured representations to synthesize full relational databases end-to-end. Our key contributions are: (1) a joint graph-conditioned diffusion mechanism enabling decoupled generation of schema structure and attribute values; (2) a 2K+-graph generator based on the stochastic block model (SBM) that provably enforces referential integrity; and (3) novel techniques including foreign-key-aware graph encoding, multi-table joint noise scheduling, and relation-aware normalization. Evaluated on 11 benchmark datasets, our method consistently outperforms state-of-the-art approaches, achieving significant improvements in statistical fidelity, query consistency, and utility for downstream machine learning tasks.

Technology Category

Application Category

📝 Abstract
Real-world databases are predominantly relational, comprising multiple interlinked tables that contain complex structural and statistical dependencies. Learning generative models on relational data has shown great promise in generating synthetic data and imputing missing values. However, existing methods often struggle to capture this complexity, typically reducing relational data to conditionally generated flat tables and imposing limiting structural assumptions. To address these limitations, we introduce RelDiff, a novel diffusion generative model that synthesizes complete relational databases by explicitly modeling their foreign key graph structure. RelDiff combines a joint graph-conditioned diffusion process across all tables for attribute synthesis, and a $2K+$SBM graph generator based on the Stochastic Block Model for structure generation. The decomposition of graph structure and relational attributes ensures both high fidelity and referential integrity, both of which are crucial aspects of synthetic relational database generation. Experiments on 11 benchmark datasets demonstrate that RelDiff consistently outperforms prior methods in producing realistic and coherent synthetic relational databases. Code is available at https://github.com/ValterH/RelDiff.
Problem

Research questions and friction points this paper is trying to address.

Modeling complex dependencies in relational databases
Generating synthetic relational data with integrity
Improving fidelity in relational database synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based diffusion models for relational data
Joint graph-conditioned diffusion for attribute synthesis
2K+SBM graph generator for structure generation
🔎 Similar Papers
No similar papers found.
Valter Hudovernik
Valter Hudovernik
University of Ljubljana
machine learningdeep learningsynthetic data
Minkai Xu
Minkai Xu
Stanford University
Generative AI
J
Juntong Shi
Stanford University
L
Lovro vSubelj
University of Ljubljana
Stefano Ermon
Stefano Ermon
Stanford University
Artificial IntelligenceMachine Learning
E
Erik vStrumbelj
University of Ljubljana
J
J. Leskovec
Stanford University