SPILLage: Agentic Oversharing on the Web

📅 2026-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the unintended leakage of user privacy by large language model–driven web agents, which often disclose task-irrelevant personal information through their actions. We propose SPILLage, a framework that formally defines and quantifies the phenomenon of “natural agent oversharing,” introduces a two-dimensional taxonomy of privacy leakage—distinguishing content versus behavior and explicit versus implicit disclosures—and establishes a benchmark comprising 180 e-commerce tasks. Leveraging real-world website interaction traces annotated for task relevance, our analysis of 1,080 experimental runs reveals that behavioral leakage occurs five times more frequently than content leakage. Furthermore, we demonstrate that proactively removing irrelevant information via prompt engineering and input sanitization not only substantially reduces privacy exposure but also improves task success rates by up to 17.9%, uncovering a positive correlation between privacy preservation and task performance.

Technology Category

Application Category

📝 Abstract
LLM-powered agents are beginning to automate user's tasks across the open web, often with access to user resources such as emails and calendars. Unlike standard LLMs answering questions in a controlled ChatBot setting, web agents act"in the wild", interacting with third parties and leaving behind an action trace. Therefore, we ask the question: how do web agents handle user resources when accomplishing tasks on their behalf across live websites? In this paper, we formalize Natural Agentic Oversharing -- the unintentional disclosure of task-irrelevant user information through an agent trace of actions on the web. We introduce SPILLage, a framework that characterizes oversharing along two dimensions: channel (content vs. behavior) and directness (explicit vs. implicit). This taxonomy reveals a critical blind spot: while prior work focuses on text leakage, web agents also overshare behaviorally through clicks, scrolls, and navigation patterns that can be monitored. We benchmark 180 tasks on live e-commerce sites with ground-truth annotations separating task-relevant from task-irrelevant attributes. Across 1,080 runs spanning two agentic frameworks and three backbone LLMs, we demonstrate that oversharing is pervasive with behavioral oversharing dominates content oversharing by 5x. This effect persists -- and can even worsen -- under prompt-level mitigation. However, removing task-irrelevant information before execution improves task success by up to 17.9%, demonstrating that reducing oversharing improves task success. Our findings underscore that protecting privacy in web agents is a fundamental challenge, requiring a broader view of"output"that accounts for what agents do on the web, not just what they type. Our datasets and code are available at https://github.com/jrohsc/SPILLage.
Problem

Research questions and friction points this paper is trying to address.

agentic oversharing
web agents
privacy leakage
behavioral traces
LLM agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic oversharing
behavioral leakage
web agents
privacy in LLMs
SPILLage framework
🔎 Similar Papers
No similar papers found.