🤖 AI Summary
This paper addresses private mean estimation under differential privacy for heterogeneous populations, aiming to minimize estimation variance while accommodating heterogeneous privacy requirements. We propose the first privacy-aware optimal stratified sampling framework, jointly modeling stratification structure and dynamic privacy budget allocation as a strongly convex integer optimization problem—achieving variance minimization under Laplace, discrete Laplace, and truncated uniform Laplace mechanisms. Theoretically, we characterize structural properties of the optimal sampling design and budget allocation. We further develop an efficient, scalable algorithm with provable convergence guarantees. Compared to conventional uniform budget allocation, our approach significantly mitigates variance inflation, achieving both theoretical optimality and practical deployability.
📝 Abstract
This work identifies the first privacy-aware stratified sampling scheme that minimizes the variance for general private mean estimation under the Laplace, Discrete Laplace (DLap) and Truncated-Uniform-Laplace (TuLap) mechanisms within the framework of differential privacy (DP). We view stratified sampling as a subsampling operation, which amplifies the privacy guarantee; however, to have the same final privacy guarantee for each group, different nominal privacy budgets need to be used depending on the subsampling rate. Ignoring the effect of DP, traditional stratified sampling strategies risk significant variance inflation. We phrase our optimal survey design as an optimization problem, where we determine the optimal subsampling sizes for each group with the goal of minimizing the variance of the resulting estimator. We establish strong convexity of the variance objective, propose an efficient algorithm to identify the integer-optimal design, and offer insights on the structure of the optimal design.