🤖 AI Summary
Private Evolution (PE), a training-free differentially private synthetic data generation method for tabular data, exhibits unstable convergence behavior, and existing theoretical analyses rely on unrealistic assumptions.
Method: We develop the first practical and realistically grounded convergence analysis framework for PE. Specifically, we derive the first general convergence guarantee under bounded domain assumptions and establish an upper bound of $ ilde{O}(d(nepsilon)^{-1/d})$ on the 1-Wasserstein error. We further uncover a fundamental connection between PE and the private sign measure mechanism.
Results: Our theory rigorously proves worst-case convergence as $n o infty$, and extensive simulations validate that the theoretical predictions align closely with empirical behavior. This work substantially enhances both the theoretical credibility and practical applicability of PE for tabular data synthesis.
📝 Abstract
Private Evolution (PE) is a promising training-free method for differentially private (DP) synthetic data generation. While it achieves strong performance in some domains (e.g., images and text), its behavior in others (e.g., tabular data) is less consistent. To date, the only theoretical analysis of the convergence of PE depends on unrealistic assumptions about both the algorithm's behavior and the structure of the sensitive dataset. In this work, we develop a new theoretical framework to explain PE's practical behavior and identify sufficient conditions for its convergence. For $d$-dimensional sensitive datasets with $n$ data points from a bounded domain, we prove that PE produces an $(epsilon, delta)$-DP synthetic dataset with expected 1-Wasserstein distance of order $ ilde{O}(d(nepsilon)^{-1/d})$ from the original, establishing worst-case convergence of the algorithm as $n o infty$. Our analysis extends to general Banach spaces as well. We also connect PE to the Private Signed Measure Mechanism, a method for DP synthetic data generation that has thus far not seen much practical adoption. We demonstrate the practical relevance of our theoretical findings in simulations.