🤖 AI Summary
Existing differentially private synthetic tabular data methods (e.g., AIM) suffer from memory explosion, computational inefficiency, and the need for full retraining upon graph-structure changes in high-dimensional settings; while generative approaches like GEM improve scalability, their empirical validation remains limited to small-scale datasets. This paper proposes a novel framework that synergistically integrates AIM’s adaptive measurement mechanism with GEM’s generative neural network, enabling the first end-to-end, differentially private synthesis of high-dimensional tabular data with over one hundred columns. By adaptively selecting and injecting calibrated noise into low-order marginal distributions—and jointly optimizing the generative network—the framework drastically reduces memory footprint and training overhead. Extensive experiments on multiple benchmark datasets demonstrate superior data utility and computational efficiency over AIM and other baselines. Notably, our method successfully synthesizes large-scale, high-dimensional datasets on which AIM fails due to resource constraints.
📝 Abstract
State-of-the-art differentially private synthetic tabular data has been defined by adaptive'select-measure-generate'frameworks, exemplified by methods like AIM. These approaches iteratively measure low-order noisy marginals and fit graphical models to produce synthetic data, enabling systematic optimisation of data quality under privacy constraints. Graphical models, however, are inefficient for high-dimensional data because they require substantial memory and must be retrained from scratch whenever the graph structure changes, leading to significant computational overhead. Recent methods, like GEM, overcome these limitations by using generator neural networks for improved scalability. However, empirical comparisons have mostly focused on small datasets, limiting real-world applicability. In this work, we introduce GEM+, which integrates AIM's adaptive measurement framework with GEM's scalable generator network. Our experiments show that GEM+ outperforms AIM in both utility and scalability, delivering state-of-the-art results while efficiently handling datasets with over a hundred columns, where AIM fails due to memory and computational overheads.