GEM+: Scalable State-of-the-Art Private Synthetic Data with Generator Networks

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing differentially private synthetic tabular data methods (e.g., AIM) suffer from memory explosion, computational inefficiency, and the need for full retraining upon graph-structure changes in high-dimensional settings; while generative approaches like GEM improve scalability, their empirical validation remains limited to small-scale datasets. This paper proposes a novel framework that synergistically integrates AIM’s adaptive measurement mechanism with GEM’s generative neural network, enabling the first end-to-end, differentially private synthesis of high-dimensional tabular data with over one hundred columns. By adaptively selecting and injecting calibrated noise into low-order marginal distributions—and jointly optimizing the generative network—the framework drastically reduces memory footprint and training overhead. Extensive experiments on multiple benchmark datasets demonstrate superior data utility and computational efficiency over AIM and other baselines. Notably, our method successfully synthesizes large-scale, high-dimensional datasets on which AIM fails due to resource constraints.

Technology Category

Application Category

📝 Abstract
State-of-the-art differentially private synthetic tabular data has been defined by adaptive'select-measure-generate'frameworks, exemplified by methods like AIM. These approaches iteratively measure low-order noisy marginals and fit graphical models to produce synthetic data, enabling systematic optimisation of data quality under privacy constraints. Graphical models, however, are inefficient for high-dimensional data because they require substantial memory and must be retrained from scratch whenever the graph structure changes, leading to significant computational overhead. Recent methods, like GEM, overcome these limitations by using generator neural networks for improved scalability. However, empirical comparisons have mostly focused on small datasets, limiting real-world applicability. In this work, we introduce GEM+, which integrates AIM's adaptive measurement framework with GEM's scalable generator network. Our experiments show that GEM+ outperforms AIM in both utility and scalability, delivering state-of-the-art results while efficiently handling datasets with over a hundred columns, where AIM fails due to memory and computational overheads.
Problem

Research questions and friction points this paper is trying to address.

Graphical models require excessive memory and computational resources for high-dimensional data
Existing generator network methods lack evaluation on large real-world datasets
Current approaches struggle to balance privacy constraints with data utility and scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates adaptive measurement with scalable generator networks
Outperforms AIM in utility and scalability metrics
Handles high-dimensional datasets with over hundred columns
🔎 Similar Papers
No similar papers found.
S
Samuel Maddock
Meta Platforms, Inc., London, UK
S
Shripad Gade
Meta Platforms, Inc., Menlo Park, CA, USA
Graham Cormode
Graham Cormode
Meta AI, University of Warwick
AlgorithmsData AnalysisHapax LegomenonPrivacy
W
Will Bullock
Meta Platforms, Inc., Menlo Park, CA, USA