Private Estimation when Data and Privacy Demands are Correlated

πŸ“… 2024-07-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper studies statistical estimation of the univariate empirical mean and categorical data frequencies under heterogeneous differential privacy (DP), where users specify individualized privacy budgets and privacy requirements may depend on the data itself (worst-case setting). To address this novel challenge, we formally model and solve the heterogeneous DP estimation problem under data–privacy correlation for the first time. We propose a unified algorithmic framework based on randomized shuffling for decoupling and adaptive noise injection, achieving theoretically optimal convergence rates in both PAC learning and mean squared error (MSE) senses. Our analysis establishes tight lower bounds and proves matching upper bounds, confirming statistical optimality. Experiments across diverse heterogeneous privacy configurations demonstrate that the proposed method consistently outperforms existing baselines, yielding accuracy improvements of 20%–50%.

Technology Category

Application Category

πŸ“ Abstract
Differential Privacy (DP) is the current gold-standard for ensuring privacy for statistical queries. Estimation problems under DP constraints appearing in the literature have largely focused on providing equal privacy to all users. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, both subject to heterogeneous privacy constraints. Each user, contributing a sample to the dataset, is allowed to have a different privacy demand. The dataset itself is assumed to be worst-case and we study both problems under two different formulations -- first, where privacy demands and data may be correlated, and second, where correlations are weakened by random permutation of the dataset. We establish theoretical performance guarantees for our proposed algorithms, under both PAC error and mean-squared error. These performance guarantees translate to minimax optimality in several instances, and experiments confirm superior performance of our algorithms over other baseline techniques.
Problem

Research questions and friction points this paper is trying to address.

Heterogeneous privacy constraints for data estimation
Correlation between privacy demands and data
Minimax optimal algorithms for private estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous privacy constraints for users
Handling correlated privacy demands and data
Random permutation to weaken data correlations
πŸ”Ž Similar Papers
No similar papers found.
S
Syomantak Chaudhuri
University of California, Berkeley
T
T. Courtade
University of California, Berkeley