🤖 AI Summary
Recent critiques have challenged the differential privacy guarantees of PATE-GAN and PrivBayes, questioning the validity of their privacy-utility trade-offs. However, these critiques rely on restrictive assumptions—such as synthetic or simplistic data distributions—and limited experimental settings, potentially biasing their conclusions.
Method: We propose a more general privacy-utility evaluation framework that integrates privacy game analysis and theoretical verification, and conduct k-anonymity benchmarking experiments on real-world datasets without distributional assumptions.
Contribution/Results: Under identical privacy budgets, both PATE-GAN and PrivBayes significantly outperform k-anonymity in statistical utility while maintaining strong differential privacy guarantees. We demonstrate that prior claims of “privacy failure” stem from flawed evaluation premises—specifically, the absence of rigorous privacy accounting and realistic data assumptions. Our empirical analysis refutes these criticisms and establishes synthetic data generation as a robust and effective privacy-enhancing technology.
📝 Abstract
Synthetic data has been considered a better privacy-preserving alternative to traditionally sanitized data across various applications. However, a recent article challenges this notion, stating that synthetic data does not provide a better trade-off between privacy and utility than traditional anonymization techniques, and that it leads to unpredictable utility loss and highly unpredictable privacy gain. The article also claims to have identified a breach in the differential privacy guarantees provided by PATE-GAN and PrivBayes. When a study claims to refute or invalidate prior findings, it is crucial to verify and validate the study. In our work, we analyzed the implementation of the privacy game described in the article and found that it operated in a highly specialized and constrained environment, which limits the applicability of its findings to general cases. Our exploration also revealed that the game did not satisfy a crucial precondition concerning data distributions, which contributed to the perceived violation of the differential privacy guarantees offered by PATE-GAN and PrivBayes. We also conducted a privacy-utility trade-off analysis in a more general and unconstrained environment. Our experimentation demonstrated that synthetic data indeed achieves a more favorable privacy-utility trade-off compared to the provided implementation of k-anonymization, thereby reaffirming earlier conclusions.