🤖 AI Summary
This paper reveals a severe privacy leakage risk in locally deployed web research agents (WRAs): their high-frequency, temporally correlated domain access patterns—up to 70–140 domains per query—form distinctive behavioral fingerprints, enabling adversaries such as ISPs to infer user query intent and latent traits solely from network metadata. To exploit this, we propose a traffic-timing-based fingerprinting method and introduce OBELS, a novel behavioral similarity metric. OBELS enables, for the first time, high-accuracy recovery of both the semantic content of WRA prompts and users’ latent traits. Evaluated on a multi-session trajectory dataset combining synthetic personas and real-world behavior, our attack recovers over 73% of prompt functionality under partial observability and noise, and correctly identifies 19 out of 32 latent traits. Our proposed mitigations reduce attack success rate by 29% on average, without compromising system utility.
📝 Abstract
We show that Web and Research Agents (WRAs) -- language model-based systems that investigate complex topics on the Internet -- are vulnerable to inference attacks by passive network adversaries such as ISPs. These agents could be deployed emph{locally} by organizations and individuals for privacy, legal, or financial purposes. Unlike sporadic web browsing by humans, WRAs visit $70{-}140$ domains with distinguishable timing correlations, enabling unique fingerprinting attacks.
Specifically, we demonstrate a novel prompt and user trait leakage attack against WRAs that only leverages their network-level metadata (i.e., visited IP addresses and their timings). We start by building a new dataset of WRA traces based on user search queries and queries generated by synthetic personas. We define a behavioral metric (called OBELS) to comprehensively assess similarity between original and inferred prompts, showing that our attack recovers over 73% of the functional and domain knowledge of user prompts. Extending to a multi-session setting, we recover up to 19 of 32 latent traits with high accuracy. Our attack remains effective under partial observability and noisy conditions. Finally, we discuss mitigation strategies that constrain domain diversity or obfuscate traces, showing negligible utility impact while reducing attack effectiveness by an average of 29%.