PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative APIs

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low fidelity of differentially private (DP) synthetic images in few-shot, privacy-sensitive domains (e.g., healthcare and industrial inspection), this paper proposes PC-Evolution, an API-assisted Private Contrastive Evolution framework. Our method innovatively integrates private contrastive learning into an exponential-mechanism-driven evolutionary loop, explicitly modeling inter-class contrastive relationships to improve the privacy–utility trade-off under data scarcity—overcoming the performance bottleneck of DP similarity voting in conventional Private Evolution. PC-Evolution synergistically combines diffusion model API calls with class-level structural awareness optimization. Evaluated on four domain-specific benchmarks, it significantly outperforms Private Evolution (PE) and other API-based baselines, generating high-fidelity DP synthetic images. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
The rise of generative APIs has fueled interest in privacy-preserving synthetic data generation. While the Private Evolution (PE) algorithm generates Differential Privacy (DP) synthetic images using diffusion model APIs, it struggles with few-shot private data due to the limitations of its DP-protected similarity voting approach. In practice, the few-shot private data challenge is particularly prevalent in specialized domains like healthcare and industry. To address this challenge, we propose a novel API-assisted algorithm, Private Contrastive Evolution (PCEvolve), which iteratively mines inherent inter-class contrastive relationships in few-shot private data beyond individual data points and seamlessly integrates them into an adapted Exponential Mechanism (EM) to optimize DP's utility in an evolution loop. We conduct extensive experiments on four specialized datasets, demonstrating that PCEvolve outperforms PE and other API-assisted baselines. These results highlight the potential of leveraging API access with private data for quality evaluation, enabling the generation of high-quality DP synthetic images and paving the way for more accessible and effective privacy-preserving generative API applications. Our code is available at https://github.com/TsingZ0/PCEvolve.
Problem

Research questions and friction points this paper is trying to address.

Generating DP synthetic images with few-shot private data
Improving utility of Differential Privacy in specialized domains
Mining inter-class contrastive relationships for better synthetic data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses few-shot private data for synthetic generation
Integrates contrastive relationships into Exponential Mechanism
Optimizes Differential Privacy utility via evolution loop
🔎 Similar Papers
No similar papers found.
J
Jianqing Zhang
Shanghai Jiao Tong University, Institute for AI Industry Research (AIR), Tsinghua University
Y
Yang Liu
Hong Kong Polytechnic University, Shanghai Artificial Intelligence Laboratory
J
Jie Fu
Stevens Institute of Technology
Y
Yang Hua
Tianyuan Zou
Tianyuan Zou
Institute for AI Industry Research, Tsinghua University
CST
J
Jian Cao
Shanghai Jiao Tong University, Shanghai Key Laboratory of Trusted Data Circulation and Governance in Web3
Q
Qiang Yang
Hong Kong Polytechnic University