Differentially Private Synthetic Data via APIs 3: Using Simulators Instead of Foundation Model

📅 2025-02-08

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing Private Evolution (PE) frameworks for differentially private synthetic data generation rely heavily on large foundation models and require substantial domain-specific training data—posing challenges in low-resource, privacy-sensitive domains lacking suitable pre-trained models. Method: This paper introduces Sim-PE, the first framework to seamlessly integrate non-learning, domain-specific simulators (e.g., computer graphics–based image synthesis tools) into the PE pipeline—without fine-tuning or dependence on foundation models. Sim-PE employs an API-driven co-optimization mechanism that aligns simulator outputs with PE’s privacy-preserving generative objectives. Contribution/Results: Evaluated across three distinct simulator classes, Sim-PE achieves up to a 3× improvement in downstream classification accuracy and reduces Fréchet Inception Distance (FID) by up to 80%, demonstrating superior fidelity and utility in data-scarce, high-fidelity settings. The implementation is publicly available as part of the open-source Private Evolution Python library.

Technology Category

Application Category

📝 Abstract

Differentially private (DP) synthetic data, which closely resembles the original private data while maintaining strong privacy guarantees, has become a key tool for unlocking the value of private data without compromising privacy. Recently, Private Evolution (PE) has emerged as a promising method for generating DP synthetic data. Unlike other training-based approaches, PE only requires access to inference APIs from foundation models, enabling it to harness the power of state-of-the-art models. However, a suitable foundation model for a specific private data domain is not always available. In this paper, we discover that the PE framework is sufficiently general to allow inference APIs beyond foundation models. Specifically, we show that simulators -- such as computer graphics-based image synthesis tools -- can also serve as effective APIs within the PE framework. This insight greatly expands the applicability of PE, enabling the use of a wide variety of domain-specific simulators for DP data synthesis. We explore the potential of this approach, named Sim-PE, in the context of image synthesis. Across three diverse simulators, Sim-PE performs well, improving the downstream classification accuracy of PE by up to 3x and reducing the FID score by up to 80%. We also show that simulators and foundation models can be easily leveraged together within the PE framework to achieve further improvements. The code is open-sourced in the Private Evolution Python library: https://github.com/microsoft/DPSDA.

Problem

Research questions and friction points this paper is trying to address.

Generates differentially private synthetic data

Uses simulators instead of foundation models

Improves classification accuracy and reduces FID score

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulators replace foundation models

Differentially private synthetic data generation

Sim-PE enhances classification accuracy significantly

🔎 Similar Papers

Differentially Private Synthetic Data via Foundation Model APIs 1: Images