Tight and Practical Privacy Auditing for Differentially Private In-Context Learning

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Practical privacy leakage risks arise in large language model (LLM) in-context learning (ICL) when prompts contain private or proprietary data; existing differential privacy (DP) theoretical bounds are overly loose and error-prone to implement, necessitating practical auditing tools. Method: We propose the first tight and efficient ICL privacy auditing framework, unifying classification and generation tasks into a binary membership inference problem. It integrates Gaussian differential privacy conversion with optimal private voting configuration search, enhancing detection sensitivity, and supports both black-box and white-box threat models. Results: Experiments demonstrate strong alignment between measured privacy leakage and theoretical DP budgets on classification tasks, while observed leakage in generation tasks remains substantially below theoretical upper bounds. This validates the framework’s reliability, deployability, and practical utility as a trustworthy privacy auditing tool for ICL.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) perform in-context learning (ICL) by adapting to tasks from prompt demonstrations, which in practice often contain private or proprietary data. Although differential privacy (DP) with private voting is a pragmatic mitigation, DP-ICL implementations are error-prone, and worst-case DP bounds may substantially overestimate actual leakage, calling for practical auditing tools. We present a tight and efficient privacy auditing framework for DP-ICL systems that runs membership inference attacks and translates their success rates into empirical privacy guarantees using Gaussian DP. Our analysis of the private voting mechanism identifies vote configurations that maximize the auditing signal, guiding the design of audit queries that reliably reveal whether a canary demonstration is present in the context. The framework supports both black-box (API-only) and white-box (internal vote) threat models, and unifies auditing for classification and generation by reducing both to a binary decision problem. Experiments on standard text classification and generation benchmarks show that our empirical leakage estimates closely match theoretical DP budgets on classification tasks and are consistently lower on generation tasks due to conservative embedding-sensitivity bounds, making our framework a practical privacy auditor and verifier for real-world DP-ICL deployments.

Problem

Research questions and friction points this paper is trying to address.

Auditing privacy risks in differentially private in-context learning systems

Translating membership inference attacks into empirical privacy guarantees

Validating practical privacy leakage against theoretical DP bounds

Innovation

Methods, ideas, or system contributions that make the work stand out.

Membership inference attacks for auditing privacy leakage

Gaussian DP conversion of attack success rates

Unified binary decision framework for classification and generation

🔎 Similar Papers

No similar papers found.