🤖 AI Summary
Existing methods (e.g., HDMM) for releasing marginal queries over high-dimensional data (up to 100 dimensions) under differential privacy—especially for composite workloads combining marginals with range or prefix-sum queries—suffer from memory explosion and computational intractability.
Method: We propose an efficient, unbiased Gaussian noise matrix mechanism that enables global optimization of arbitrary loss objectives expressible as convex functions of marginal variances. Our approach integrates residual planning (ResidualPlanner), convex optimization modeling, sparse linear algebra acceleration, and analytical computation of variance–covariance matrices.
Results: Experiments demonstrate scalability: optimizing tens of thousands of marginals completes in seconds; hundred-attribute datasets are processed within two minutes. Memory consumption is reduced by one to two orders of magnitude. Crucially, our method is the first to support exact per-marginal variance and covariance output at scale—enabling principled downstream analysis and adaptive query answering in large-scale differentially private data release.
📝 Abstract
Noisy marginals are a common form of confidentiality-protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms. We propose ResidualPlanner, a matrix mechanism for marginals with Gaussian noise that is both optimal and scalable. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets).