CLIOPATRA: Extracting Private Information from LLM Insights

๐Ÿ“… 2026-03-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes CLIOPATRA, the first practical attack against privacy-preserving large language model (LLM) analytics systems such as Anthropicโ€™s Clio. Despite employing multi-layered heuristic safeguards, these systems remain vulnerable to sensitive information leakage. CLIOPATRA leverages adversarial prompt injection, PII de-identification bypass, clustering-based filter evasion, and output analysis to infer a target userโ€™s medical history with 39% success using only basic demographic data and a single symptom; this success rate approaches 100% with enhanced prior knowledge. The attack exposes fundamental fragilities in current multi-layered defense mechanisms and effectively evades existing LLM auditing and detection techniques, highlighting critical gaps in the privacy guarantees of ostensibly secure LLM-based analytical frameworks.

Technology Category

Application Category

๐Ÿ“ Abstract
As AI assistants become widely used, privacy-aware platforms like Anthropic's Clio have been introduced to generate insights from real-world AI use. Clio's privacy protections rely on layering multiple heuristic techniques together, including PII redaction, clustering, filtering, and LLM-based privacy auditing. In this paper, we put these claims to the test by presenting CLIOPATRA, the first privacy attack against "privacy-preserving" LLM insight systems. The attack involves a realistic adversary that carefully designs and inserts malicious chats into the system to break multiple layers of privacy protections and induce the leakage of sensitive information from a target user's chat. We evaluated CLIOPATRA on synthetically generated medical target chats, demonstrating that an adversary who knows only the basic demographics of a target user and a single symptom can successfully extract the user's medical history in 39% of cases by just inspecting Clio's output. Furthermore, CLIOPATRA can reach close to 100% when Clio is configured with other state-of-the-art models and the adversary's knowledge of the target user is increased. We also show that existing ad hoc mitigations, such as LLM-based privacy auditing, are unreliable and fail to detect major leaks. Our findings indicate that even when layered, current heuristic protections are insufficient to adequately protect user data in LLM-based analysis systems.
Problem

Research questions and friction points this paper is trying to address.

privacy leakage
LLM insights
PII extraction
adversarial attack
privacy-preserving systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

privacy attack
LLM insights
PII redaction
adversarial prompting
privacy auditing
๐Ÿ”Ž Similar Papers
No similar papers found.