Data-adaptive Differentially Private Prompt Synthesis for In-Context Learning

📅 2024-10-15
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the risk of private prompt examples leaking in large language model (LLM) in-context learning (ICL), this paper proposes a data-adaptive differentially private prompt synthesis method. Our core innovation is a precision-oriented iterative radius reduction mechanism: leveraging the clustering structure of input data to dynamically adjust the noise aggregation scope, thereby enabling efficient allocation of the privacy budget under strict ε-differential privacy guarantees. The method jointly integrates adaptive noise injection, differentially private synthetic data generation, and ICL-specific prompt engineering to significantly enhance semantic fidelity of synthesized examples. On standard few-shot benchmarks, our approach achieves accuracy close to the non-private baseline—substantially outperforming existing differentially private few-shot generation methods—and marks the first work to achieve synergistic optimization of privacy protection and ICL performance.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) rely on the contextual information embedded in examples/demonstrations to perform in-context learning (ICL). To mitigate the risk of LLMs potentially leaking private information contained in examples in the prompt, we introduce a novel data-adaptive differentially private algorithm called AdaDPSyn to generate synthetic examples from the private dataset and then use these synthetic examples to perform ICL. The objective of AdaDPSyn is to adaptively adjust the noise level in the data synthesis mechanism according to the inherent statistical properties of the data, thereby preserving high ICL accuracy while maintaining formal differential privacy guarantees. A key innovation in AdaDPSyn is the Precision-Focused Iterative Radius Reduction technique, which dynamically refines the aggregation radius - the scope of data grouping for noise addition - based on patterns observed in data clustering, thereby minimizing the amount of additive noise. We conduct extensive experiments on standard benchmarks and compare AdaDPSyn with DP few-shot generation algorithm (Tang et al., 2023). The experiments demonstrate that AdaDPSyn not only outperforms DP few-shot generation, but also maintains high accuracy levels close to those of non-private baselines, providing an effective solution for ICL with privacy protection.
Problem

Research questions and friction points this paper is trying to address.

Mitigate private information leakage in LLMs during in-context learning.
Adaptively adjust noise levels to preserve ICL accuracy and privacy.
Dynamically refine data grouping to minimize additive noise in synthesis.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-adaptive differentially private algorithm AdaDPSyn
Precision-Focused Iterative Radius Reduction technique
Synthetic examples for privacy-preserving in-context learning
🔎 Similar Papers
No similar papers found.
Fengyu Gao
Fengyu Gao
University of Virginia
PrivacyMachine Learning
Ruida Zhou
Ruida Zhou
Amazon AGI
information theoryreinforcement learninggeneralization
T
Tianhao Wang
University of Virginia
C
Cong Shen
University of Virginia
J
Jing Yang
The Pennsylvania State University