🤖 AI Summary
This work addresses the problem of unintentional disclosure of sensitive information by users during interactions with large language models (LLMs), stemming from insufficient privacy awareness. We formally introduce the concept of *contextual privacy*—retaining only the minimal, task-relevant context necessary to fulfill the user’s intent. To this end, we propose a lightweight, locally deployable real-time prompt sanitization framework that operates as an intermediary layer prior to LLM input. It integrates user intent modeling, context boundary detection, relevance classification, and dynamic rewriting to identify and reconstruct privacy-sensitive content. Evaluated on the ShareGPT dataset, our approach reduces sensitive information exposure by 62% while preserving 98.3% task completion accuracy. Our key contributions include: (1) the first formal definition of contextual privacy; (2) a client-side, real-time sanitization architecture; and (3) empirical evidence revealing indirect information leakage—a previously underexplored privacy threat in LLM interactions.
📝 Abstract
Conversational agents are increasingly woven into individuals' personal lives, yet users often underestimate the privacy risks involved. The moment users share information with these agents (e.g., LLMs), their private information becomes vulnerable to exposure. In this paper, we characterize the notion of contextual privacy for user interactions with LLMs. It aims to minimize privacy risks by ensuring that users (sender) disclose only information that is both relevant and necessary for achieving their intended goals when interacting with LLMs (untrusted receivers). Through a formative design user study, we observe how even"privacy-conscious"users inadvertently reveal sensitive information through indirect disclosures. Based on insights from this study, we propose a locally-deployable framework that operates between users and LLMs, and identifies and reformulates out-of-context information in user prompts. Our evaluation using examples from ShareGPT shows that lightweight models can effectively implement this framework, achieving strong gains in contextual privacy while preserving the user's intended interaction goals through different approaches to classify information relevant to the intended goals.