Privacy Vulnerabilities in Marginals-based Synthetic Data

📅 2024-10-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work exposes a critical privacy vulnerability in marginal-probability-based differentially private synthetic data generation (DP-SDG): the marginal structures preserved to ensure statistical fidelity can be reverse-engineered to recover individual membership with high accuracy. To address this, the authors propose MAMA-MIA—the first white-box membership inference attack specifically designed for marginal-preserving DP-SDG algorithms. By modeling internal algorithmic mechanisms and leveraging statistical bias and marginal consistency constraints, MAMA-MIA enables efficient, high-precision membership inference against prominent marginal-based DP-SDG methods, including MST, PrivBayes, and Private-GSD. Experiments demonstrate that MAMA-MIA achieves an average membership inference accuracy of 92% across these three algorithms, operates 10³–10⁴ times faster than prior attacks, and won first place in the inaugural SNAKE Privacy Attack Competition.

Technology Category

Application Category

📝 Abstract

When acting as a privacy-enhancing technology, synthetic data generation (SDG) aims to maintain a resemblance to the real data while excluding personally-identifiable information. Many SDG algorithms provide robust differential privacy (DP) guarantees to this end. However, we show that the strongest class of SDG algorithms--those that preserve extit{marginal probabilities}, or similar statistics, from the underlying data--leak information about individuals that can be recovered more efficiently than previously understood. We demonstrate this by presenting a novel membership inference attack, MAMA-MIA, and evaluate it against three seminal DP SDG algorithms: MST, PrivBayes, and Private-GSD. MAMA-MIA leverages knowledge of which SDG algorithm was used, allowing it to learn information about the hidden data more accurately, and orders-of-magnitude faster, than other leading attacks. We use MAMA-MIA to lend insight into existing SDG vulnerabilities. Our approach went on to win the first SNAKE (SaNitization Algorithm under attacK ... $varepsilon$) competition.

Problem

Research questions and friction points this paper is trying to address.

Privacy leaks in marginal-based synthetic data generation

Efficient membership inference attacks on DP SDG algorithms

Vulnerabilities in MST, PrivBayes, and Private-GSD algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel membership inference attack MAMA-MIA

Targets marginal-based synthetic data algorithms

Outperforms prior attacks in speed and accuracy

🔎 Similar Papers

No similar papers found.