🤖 AI Summary
This study investigates whether sampling or record suppression, when employed as preprocessing steps in differential privacy (DP), can improve the privacy–utility trade-off. Through rigorous theoretical analysis and empirical evaluation, the authors systematically assess the impact of uniform sampling and general suppression strategies across a range of classical and modern DP mechanisms—including Laplace, Gaussian, exponential, and Report Noisy Max. The work establishes, for the first time, that such preprocessing universally degrades utility under a fixed privacy budget, with no meaningful exceptions—even when uniform sampling is applied as the optimal strategy. These findings directly challenge the prevailing assumption in DP practice that preprocessing enhances utility, revealing a fundamental flaw in this widely held belief.
📝 Abstract
Sampling is renowned for its privacy amplification in differential privacy (DP), and is often assumed to improve the utility of a DP mechanism by allowing a noise reduction. In this paper, we further show that this last assumption is flawed: When measuring utility at equal privacy levels, sampling as preprocessing consistently yields penalties due to utility loss from omitting records over all canonical DP mechanisms -- Laplace, Gaussian, exponential, and report noisy max -- , as well as recent applications of sampling, such as clustering. Extending this analysis, we investigate suppression as a generalized method of choosing, or omitting, records. Developing a theoretical analysis of this technique, we derive privacy bounds for arbitrary suppression strategies under unbounded approximate DP. We find that our tested suppression strategy also fails to improve the privacy--utility tradeoff. Surprisingly, uniform sampling emerges as one of the best suppression methods -- despite its still degrading effect. Our results call into question common preprocessing assumptions in DP practice.