Generating Synthetic Relational Tabular Data via Structural Causal Models

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing synthetic tabular data methods struggle to model cross-table causal dependencies prevalent in real-world relational databases, leading to distorted and practically limited synthetic data. This paper proposes the first structural causal model (SCM)-based framework for relational tabular data generation, explicitly representing inter-table dependencies as a causal graph and integrating probabilistic graphical models with conditional generative mechanisms to enable joint multi-table modeling. It is the first approach to achieve interpretable and controllable synthesis of complex causal structures extracted from real relational databases. Experiments demonstrate that the generated data significantly outperforms state-of-the-art baselines in structural fidelity, statistical consistency, and downstream task performance—including TabPFN training and evaluation—thereby substantially enhancing the realism and practical utility of synthetic relational data.

Technology Category

Application Category

📝 Abstract
Synthetic tabular data generation has received increasing attention in recent years, particularly with the emergence of foundation models for tabular data. The breakthrough success of TabPFN (Hollmann et al.,2025), which leverages vast quantities of synthetic tabular datasets derived from structural causal models (SCMs), demonstrates the critical role synthetic data plays in developing powerful tabular foundation models. However, most real-world tabular data exists in relational formats spanning multiple interconnected tables - a structure not adequately addressed by current generation methods. In this work, we extend the SCM-based approach by developing a novel framework that generates realistic synthetic relational tabular data including causal relationships across tables. Our experiments confirm that this framework is able to construct relational datasets with complex inter-table dependencies mimicking real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Generating synthetic relational tabular data
Addressing complex inter-table dependencies
Extending SCMs for multi-table causality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends SCMs for relational tabular data
Generates inter-table causal relationships
Mimics real-world complex dependencies
🔎 Similar Papers
No similar papers found.
Frederik Hoppe
Frederik Hoppe
RWTH Aachen University
A
Astrid Franz
CONTACT Software GmbH, Wiener Str. 1-3, 28359 Bremen, Germany
L
Lars Kleinemeier
CONTACT Software GmbH, Wiener Str. 1-3, 28359 Bremen, Germany
U
Udo Göbel
CONTACT Software GmbH, Wiener Str. 1-3, 28359 Bremen, Germany