Secret-Protected Evolution for Differentially Private Synthetic Text Generation

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing differentially private text generation methods employ uniform noise injection, over-protecting non-sensitive content and thereby degrading utility and increasing computational overhead. To address this, we propose Secret-Protected Evolution (SecPE), the first framework to introduce a *(p,r)*-secret protection theory that relaxes the Gaussian differential privacy assumption, enabling fine-grained privacy allocation—strong protection for sensitive content and minimal perturbation for non-sensitive content. SecPE integrates a private evolutionary framework with a selective noise injection mechanism, jointly optimizing generation quality using fidelity metrics such as Fréchet Inception Distance (FID). Evaluated on OpenReview, PubMed, and Yelp datasets, SecPE achieves significantly lower FID scores, higher downstream task accuracy, and equivalent privacy guarantees with substantially less noise—effectively balancing privacy preservation, utility, and computational efficiency.

Technology Category

Application Category

📝 Abstract

Text data has become extremely valuable on large language models (LLMs) and even lead to general artificial intelligence (AGI). A lot of high-quality text in the real world is private and cannot be freely used due to privacy concerns. Therefore, differentially private (DP) synthetic text generation has been proposed, aiming to produce high-utility synthetic data while protecting sensitive information. However, existing DP synthetic text generation imposes uniform guarantees that often overprotect non-sensitive content, resulting in substantial utility loss and computational overhead. Therefore, we propose Secret-Protected Evolution (SecPE), a novel framework that extends private evolution with secret-aware protection. Theoretically, we show that SecPE satisfies $(mathrm{p}, mathrm{r})$-secret protection, constituting a relaxation of Gaussian DP that enables tighter utility-privacy trade-offs, while also substantially reducing computational complexity relative to baseline methods. Empirically, across the OpenReview, PubMed, and Yelp benchmarks, SecPE consistently achieves lower Fréchet Inception Distance (FID) and higher downstream task accuracy than GDP-based Aug-PE baselines, while requiring less noise to attain the same level of protection. Our results highlight that secret-aware guarantees can unlock more practical and effective privacy-preserving synthetic text generation.

Problem

Research questions and friction points this paper is trying to address.

Protecting sensitive text while generating synthetic data

Reducing utility loss from uniform privacy guarantees

Improving computational efficiency in private text generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Secret-Protected Evolution framework for text generation

Relaxes Gaussian DP with secret-aware protection mechanism

Reduces computational complexity while improving privacy-utility tradeoff

🔎 Similar Papers

InferDPT: Privacy-Preserving Inference for Black-box Large Language Model