Privacy Vulnerabilities in Marginals-based Synthetic Data

πŸ“… 2024-10-07
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work exposes a critical privacy vulnerability in marginal-probability-based differentially private synthetic data generation (DP-SDG): the marginal structures preserved to ensure statistical fidelity can be reverse-engineered to recover individual membership with high accuracy. To address this, the authors propose MAMA-MIAβ€”the first white-box membership inference attack specifically designed for marginal-preserving DP-SDG algorithms. By modeling internal algorithmic mechanisms and leveraging statistical bias and marginal consistency constraints, MAMA-MIA enables efficient, high-precision membership inference against prominent marginal-based DP-SDG methods, including MST, PrivBayes, and Private-GSD. Experiments demonstrate that MAMA-MIA achieves an average membership inference accuracy of 92% across these three algorithms, operates 10³–10⁴ times faster than prior attacks, and won first place in the inaugural SNAKE Privacy Attack Competition.

Technology Category

Application Category

πŸ“ Abstract
When acting as a privacy-enhancing technology, synthetic data generation (SDG) aims to maintain a resemblance to the real data while excluding personally-identifiable information. Many SDG algorithms provide robust differential privacy (DP) guarantees to this end. However, we show that the strongest class of SDG algorithms--those that preserve extit{marginal probabilities}, or similar statistics, from the underlying data--leak information about individuals that can be recovered more efficiently than previously understood. We demonstrate this by presenting a novel membership inference attack, MAMA-MIA, and evaluate it against three seminal DP SDG algorithms: MST, PrivBayes, and Private-GSD. MAMA-MIA leverages knowledge of which SDG algorithm was used, allowing it to learn information about the hidden data more accurately, and orders-of-magnitude faster, than other leading attacks. We use MAMA-MIA to lend insight into existing SDG vulnerabilities. Our approach went on to win the first SNAKE (SaNitization Algorithm under attacK ... $varepsilon$) competition.
Problem

Research questions and friction points this paper is trying to address.

Privacy leaks in marginal-based synthetic data generation
Efficient membership inference attacks on DP SDG algorithms
Vulnerabilities in MST, PrivBayes, and Private-GSD algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel membership inference attack MAMA-MIA
Targets marginal-based synthetic data algorithms
Outperforms prior attacks in speed and accuracy
πŸ”Ž Similar Papers
No similar papers found.