ContextLeak: Auditing Leakage in Private In-Context Learning Methods

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This paper addresses the lack of empirical auditing for worst-case leakage of sensitive information in private in-context learning (ICL). We propose the first quantifiable and reproducible privacy leakage auditing framework. Methodologically, it integrates canary token injection, targeted probe query generation, embedding-space perturbation analysis, and the Report Noisy Max mechanism to establish an Embedding Space Aggregation–based paradigm for leakage intensity assessment. Key contributions include: (1) the first empirical validation of a strong correlation between measured leakage magnitude and theoretical privacy budget ε; (2) uncovering pervasive privacy–utility tradeoff imbalances across mainstream approaches—e.g., prompt masking induces substantial leakage, while noise-based aggregation severely degrades task performance; and (3) achieving high-sensitivity leakage detection across diverse private ICL methods, thereby establishing an empirical benchmark and actionable optimization pathways for ICL privacy protection.

Technology Category

Application Category

📝 Abstract

In-Context Learning (ICL) has become a standard technique for adapting Large Language Models (LLMs) to specialized tasks by supplying task-specific exemplars within the prompt. However, when these exemplars contain sensitive information, reliable privacy-preserving mechanisms are essential to prevent unintended leakage through model outputs. Many privacy-preserving methods are proposed to protect the information leakage in the context, but there are less efforts on how to audit those methods. We introduce ContextLeak, the first framework to empirically measure the worst-case information leakage in ICL. ContextLeak uses canary insertion, embedding uniquely identifiable tokens in exemplars and crafting targeted queries to detect their presence. We apply ContextLeak across a range of private ICL techniques, both heuristic such as prompt-based defenses and those with theoretical guarantees such as Embedding Space Aggregation and Report Noisy Max. We find that ContextLeak tightly correlates with the theoretical privacy budget ($ε$) and reliably detects leakage. Our results further reveal that existing methods often strike poor privacy-utility trade-offs, either leaking sensitive information or severely degrading performance.

Problem

Research questions and friction points this paper is trying to address.

Audits privacy leakage in private in-context learning methods

Measures worst-case information leakage using canary insertion

Evaluates privacy-utility trade-offs in existing defense techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework audits worst-case information leakage in ICL

Uses canary insertion with identifiable tokens and queries

Evaluates heuristic and theoretically-guaranteed private ICL methods

🔎 Similar Papers

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions