🤖 AI Summary
This work investigates the theoretical mechanisms by which synthetic data amplifies differential privacy (DP) guarantees—specifically, whether and how privacy protection of an original model improves when its outputs are replaced with synthetic data generated by a *hidden* generative model.
Method: Focusing on linear regression, we rigorously analyze privacy amplification under two generation paradigms: (i) synthetic data drawn from a distribution over random inputs, and (ii) deterministic, seed-controlled generation of a single synthetic point. We integrate DP theory, linear regression modeling, and tight privacy leakage bounds.
Contribution/Results: We establish the first formal proof that finite synthetic datasets generated from random inputs strictly amplify DP beyond the original model’s guarantee, whereas seed-controlled single-point synthesis can cause complete privacy collapse. Our analysis yields dual theoretical boundaries—“positive amplification” and “negative collapse”—and introduces the first scalable framework for privacy amplification analysis applicable to general generative models.
📝 Abstract
Synthetic data inherits the differential privacy guarantees of the model used to generate it. Additionally, synthetic data may benefit from privacy amplification when the generative model is kept hidden. While empirical studies suggest this phenomenon, a rigorous theoretical understanding is still lacking. In this paper, we investigate this question through the well-understood framework of linear regression. First, we establish negative results showing that if an adversary controls the seed of the generative model, a single synthetic data point can leak as much information as releasing the model itself. Conversely, we show that when synthetic data is generated from random inputs, releasing a limited number of synthetic data points amplifies privacy beyond the model's inherent guarantees. We believe our findings in linear regression can serve as a foundation for deriving more general bounds in the future.