An Optimal and Scalable Matrix Mechanism for Noisy Marginals under Convex Loss Functions

📅 2023-05-14
🏛️ Neural Information Processing Systems
📈 Citations: 4
Influential: 0
📄 PDF

career value

194K/year
🤖 AI Summary
Existing methods (e.g., HDMM) for releasing marginal queries over high-dimensional data (up to 100 dimensions) under differential privacy—especially for composite workloads combining marginals with range or prefix-sum queries—suffer from memory explosion and computational intractability. Method: We propose an efficient, unbiased Gaussian noise matrix mechanism that enables global optimization of arbitrary loss objectives expressible as convex functions of marginal variances. Our approach integrates residual planning (ResidualPlanner), convex optimization modeling, sparse linear algebra acceleration, and analytical computation of variance–covariance matrices. Results: Experiments demonstrate scalability: optimizing tens of thousands of marginals completes in seconds; hundred-attribute datasets are processed within two minutes. Memory consumption is reduced by one to two orders of magnitude. Crucially, our method is the first to support exact per-marginal variance and covariance output at scale—enabling principled downstream analysis and adaptive query answering in large-scale differentially private data release.
📝 Abstract
Noisy marginals are a common form of confidentiality-protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms. We propose ResidualPlanner, a matrix mechanism for marginals with Gaussian noise that is both optimal and scalable. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets).
Problem

Research questions and friction points this paper is trying to address.

Scalable matrix mechanism for noisy marginals and complex queries
Optimizes accuracy of marginals efficiently in large-scale settings
Supports custom workloads combining marginals and range/prefix-sum queries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable matrix mechanism for marginal queries
Optimizes various convex loss functions efficiently
Supports complex workloads including custom queries
🔎 Similar Papers