Network-Level Prompt and Trait Leakage in Local Research Agents

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper reveals a severe privacy leakage risk in locally deployed web research agents (WRAs): their high-frequency, temporally correlated domain access patterns—up to 70–140 domains per query—form distinctive behavioral fingerprints, enabling adversaries such as ISPs to infer user query intent and latent traits solely from network metadata. To exploit this, we propose a traffic-timing-based fingerprinting method and introduce OBELS, a novel behavioral similarity metric. OBELS enables, for the first time, high-accuracy recovery of both the semantic content of WRA prompts and users’ latent traits. Evaluated on a multi-session trajectory dataset combining synthetic personas and real-world behavior, our attack recovers over 73% of prompt functionality under partial observability and noise, and correctly identifies 19 out of 32 latent traits. Our proposed mitigations reduce attack success rate by 29% on average, without compromising system utility.

Technology Category

Application Category

📝 Abstract
We show that Web and Research Agents (WRAs) -- language model-based systems that investigate complex topics on the Internet -- are vulnerable to inference attacks by passive network adversaries such as ISPs. These agents could be deployed emph{locally} by organizations and individuals for privacy, legal, or financial purposes. Unlike sporadic web browsing by humans, WRAs visit $70{-}140$ domains with distinguishable timing correlations, enabling unique fingerprinting attacks. Specifically, we demonstrate a novel prompt and user trait leakage attack against WRAs that only leverages their network-level metadata (i.e., visited IP addresses and their timings). We start by building a new dataset of WRA traces based on user search queries and queries generated by synthetic personas. We define a behavioral metric (called OBELS) to comprehensively assess similarity between original and inferred prompts, showing that our attack recovers over 73% of the functional and domain knowledge of user prompts. Extending to a multi-session setting, we recover up to 19 of 32 latent traits with high accuracy. Our attack remains effective under partial observability and noisy conditions. Finally, we discuss mitigation strategies that constrain domain diversity or obfuscate traces, showing negligible utility impact while reducing attack effectiveness by an average of 29%.
Problem

Research questions and friction points this paper is trying to address.

Network adversaries infer prompts from Web Research Agents' metadata
Passive attackers exploit timing correlations in domain visits
Leakage attacks recover user traits and search intents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Network metadata analysis for prompt leakage
Behavioral metric OBELS for similarity assessment
Multi-session latent trait recovery technique
🔎 Similar Papers
No similar papers found.