Disjoint Generative Models

📅 2025-07-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of privacy preservation and fusion difficulty due to the absence of shared identifiers across subsets in cross-domain tabular data synthesis, this paper proposes a divide-and-conquer generative framework. It partitions the original data into mutually exclusive subsets, each modeled independently by dedicated generative models, and seamlessly integrates them via a posterior linking mechanism—requiring no shared variables, identifiers, or covariates. The framework supports heterogeneous generative models and significantly strengthens differential privacy guarantees. Crucially, it maintains high data utility while introducing only negligible statistical bias. Extensive experiments on multiple real-world tabular datasets demonstrate superior privacy–utility trade-offs, strong scalability, and broad compatibility with diverse generative modeling architectures.

Technology Category

Application Category

📝 Abstract
We propose a new framework for generating cross-sectional synthetic datasets via disjoint generative models. In this paradigm, a dataset is partitioned into disjoint subsets that are supplied to separate instances of generative models. The results are then combined post hoc by a joining operation that works in the absence of common variables/identifiers. The success of the framework is demonstrated through several case studies and examples on tabular data that helps illuminate some of the design choices that one may make. The principal benefit of disjoint generative models is significantly increased privacy at only a low utility cost. Additional findings include increased effectiveness and feasibility for certain model types and the possibility for mixed-model synthesis.
Problem

Research questions and friction points this paper is trying to address.

Generating synthetic datasets without common identifiers
Enhancing privacy with minimal utility loss
Enabling mixed-model synthesis for diverse data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Disjoint generative models framework
Partitioned subsets for separate generation
Joining operation without common variables
🔎 Similar Papers
No similar papers found.
A
Anton Danholt Lautrup
Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
M
Muhammad Rajabinasab
Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
T
Tobias Hyrup
Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
Arthur Zimek
Arthur Zimek
University of Southern Denmark
Data MiningOutlier DetectionClusteringHigh dimensional dataEnsemble Methods
Peter Schneider-Kamp
Peter Schneider-Kamp
Professor of Computer Science, University of Southern Denmark
Artificial IntelligenceAutomated ReasoningDeclarative ProgrammingProgramming LanguagesSoftware Verification