🤖 AI Summary
This work addresses the severe degradation of privacy protection for high-influence records (e.g., top earners in income or corporate salary data) under traditional differential privacy mechanisms when applied to highly skewed datasets. We propose a novel per-record differential privacy framework wherein privacy loss decays logarithmically with record influence. Our method achieves graceful privacy degradation in unbounded statistical tasks (e.g., summation) via influence-adaptive noise scaling and theory-driven sensitivity control. Crucially, we break the conventional linear or quadratic decay barriers, establishing—for the first time—a provable logarithmic privacy loss decay guarantee. Experiments on real-world corporate salary data demonstrate that our approach significantly strengthens privacy protection for high-value records while reducing utility loss by over 40% compared to baseline methods. Moreover, it preserves estimator unbiasedness and satisfies strict differential privacy compliance.
📝 Abstract
We develop formal privacy mechanisms for releasing statistics from data with many outlying values, such as income data. These mechanisms ensure that a per-record differential privacy guarantee degrades slowly in the protected records' influence on the statistics being released. Formal privacy mechanisms generally add randomness, or"noise,"to published statistics. If a noisy statistic's distribution changes little with the addition or deletion of a single record in the underlying dataset, an attacker looking at this statistic will find it plausible that any particular record was present or absent, preserving the records' privacy. More influential records -- those whose addition or deletion would change the statistics' distribution more -- typically suffer greater privacy loss. The per-record differential privacy framework quantifies these record-specific privacy guarantees, but existing mechanisms let these guarantees degrade rapidly (linearly or quadratically) with influence. While this may be acceptable in cases with some moderately influential records, it results in unacceptably high privacy losses when records' influence varies widely, as is common in economic data. We develop mechanisms with privacy guarantees that instead degrade as slowly as logarithmically with influence. These mechanisms allow for the accurate, unbiased release of statistics, while providing meaningful protection for highly influential records. As an example, we consider the private release of sums of unbounded establishment data such as payroll, where our mechanisms extend meaningful privacy protection even to very large establishments. We evaluate these mechanisms empirically and demonstrate their utility.