๐ค AI Summary
Existing differentially private in-context learning (DP-ICL) methods overlook privacy risks inherent in the similarity search phase. This paper introduces, for the first time, nearest-neighbor retrieval into the differential privacy framework, proposing a dynamic privacy filtering mechanism that tracks and bounds cumulative privacy cost in real time during relevant example retrieval, thereby ensuring centralized differential privacy guarantees. Our approach integrates database-driven approximate nearest-neighbor search with privacy-aware filtering, strictly adhering to a pre-specified privacy budget without compromising retrieval quality. Experiments on text classification and document question answering demonstrate that our method significantly outperforms existing DP-ICL baselines, improving model utility by up to 12.6% under identical privacy budgets. It thus achieves a superior privacyโutility trade-off.
๐ Abstract
Differentially private in-context learning (DP-ICL) has recently become an active research topic due to the inherent privacy risks of in-context learning. However, existing approaches overlook a critical component of modern large language model (LLM) pipelines: the similarity search used to retrieve relevant context data. In this work, we introduce a DP framework for in-context learning that integrates nearest neighbor search of relevant examples in a privacy-aware manner. Our method outperforms existing baselines by a substantial margin across all evaluated benchmarks, achieving more favorable privacy-utility trade-offs. To achieve this, we employ nearest neighbor retrieval from a database of context data, combined with a privacy filter that tracks the cumulative privacy cost of selected samples to ensure adherence to a central differential privacy budget. Experimental results on text classification and document question answering show a clear advantage of the proposed method over existing baselines.