π€ AI Summary
This work addresses the challenge of balancing privacy preservation and informational utility when large language models (LLMs) draft messages on usersβ behalf. Existing approaches are limited to either deleting or generalizing sensitive content and rely on single-turn evaluations, which underestimate privacy leakage risks in realistic multi-turn interactions. To overcome these limitations, the study formulates privacy protection as an information sufficiency task and introduces free-text pseudonymization as a third strategy. It further proposes a conversational evaluation protocol based on multi-turn probing. Systematic evaluation across 792 scenarios involving seven prominent LLMs demonstrates that pseudonymization achieves the best overall trade-off between privacy and utility. Moreover, single-message assessments significantly understate risk, and generalization-based methods suffer up to a 16.3-percentage-point drop in privacy protection under multi-turn probing.
π Abstract
LLM agents increasingly draft messages on behalf of users, yet users routinely overshare sensitive information and disagree on what counts as private. Existing systems support only suppression (omitting sensitive information) and generalization (replacing information with an abstraction), and are typically evaluated on single isolated messages, leaving both the strategy space and evaluation setting incomplete. We formalize privacy-preserving LLM communication as an \textbf{Information Sufficiency (IS)} task, introduce \textbf{free-text pseudonymization} as a third strategy that replaces sensitive attributes with functionally equivalent alternatives, and propose a \textbf{conversational evaluation protocol} that assesses strategies under realistic multi-turn follow-up pressure. Across 792 scenarios spanning three power-relation types (institutional, peer, intimate) and three sensitivity categories (discrimination risk, social cost, boundary), we evaluate seven frontier LLMs on privacy at two granularities, covertness, and utility. Pseudonymization yields the strongest privacy\textendash utility tradeoff overall, and single-message evaluation systematically underestimates leakage, with generalization losing up to 16.3 percentage points of privacy under follow-up.