🤖 AI Summary
The relationship between the worst-case privacy guarantees of differential privacy (DP) and actual adversary success rates under realistic attack models remains poorly understood. Method: We propose a unified analytical framework that extends classical DP guarantees to natural adversarial settings—such as multi-column data reconstruction and personally identifiable information (PII) extraction from language models—and systematically quantifies high-probability privacy leakage bounds. Our approach integrates DP theory, high-probability error analysis, noise-response modeling, and empirical attack simulation, explicitly characterizing how adversaries’ prior knowledge affects practical privacy risk. Contribution/Results: Evaluated on DP-trained language models and tabular datasets, our framework reveals that privacy risk for non-uniformly sensitive data strongly depends on the adversary’s prior success probability. The derived bounds yield finer-grained, more empirically grounded risk assessments than conventional DP analyses, significantly enhancing the interpretability and practical utility of DP under realistic threat models.
📝 Abstract
Differential Privacy (DP) is a family of definitions that bound the worst-case privacy leakage of a mechanism. One important feature of the worst-case DP guarantee is it naturally implies protections against adversaries with less prior information, more sophisticated attack goals, and complex measures of a successful attack. However, the analytical tradeoffs between the adversarial model and the privacy protections conferred by DP are not well understood thus far. To that end, this work sheds light on what the worst-case guarantee of DP implies about the success of attackers that are more representative of real-world privacy risks.
In this paper, we present a single flexible framework that generalizes and extends the patchwork of bounds on DP mechanisms found in prior work. Our framework allows us to compute high-probability guarantees for DP mechanisms on a large family of natural attack settings that previous bounds do not capture. One class of such settings is the approximate reconstruction of multiple individuals' data, such as inferring nearly entire columns of a tabular data set from noisy marginals and extracting sensitive information from DP-trained language models.
We conduct two empirical case studies to illustrate the versatility of our bounds and compare them to the success of state-of-the-art attacks. Specifically, we study attacks that extract non-uniform PII from a DP-trained language model, as well as multi-column reconstruction attacks where the adversary has access to some columns in the clear and attempts to reconstruct the remaining columns for each person's record. We find that the absolute privacy risk of attacking non-uniform data is highly dependent on the adversary's prior probability of success. Our high probability bounds give us a nuanced understanding of the privacy leakage of DP mechanisms in a variety of previously understudied attack settings.