It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

📅 2025-12-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work reveals that large language model (LLM)-driven web agents are vulnerable to psychology-informed, interface-based implicit prompt injection attacks in dynamic web environments, leading to task deviation. To systematically study this threat, the authors introduce TRAP—the first benchmark tailored to realistic web tasks—and propose a modular, high-fidelity website cloning framework that enables controllable modeling and evaluation of social-engineering-style attacks. Experiments across six state-of-the-art models—including GPT-5 and DeepSeek-R1—demonstrate that, on average, 25% of tasks are successfully redirected (ranging from 13% for GPT-5 to 43% for DeepSeek-R1), with minor interface modifications doubling attack success rates. This study is the first to systematically identify, formalize, and quantify the structural vulnerability of web agents to psychological persuasion mechanisms. It provides both theoretical foundations and an empirical benchmark for designing robust, human-aware web automation systems.

Technology Category

Application Category

📝 Abstract
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task. We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks. Across six frontier models, agents are susceptible to prompt injection in 25% of tasks on average (13% for GPT-5 to 43% for DeepSeek-R1), with small interface or contextual changes often doubling success rates and revealing systemic, psychologically driven vulnerabilities in web-based agents. We also provide a modular social-engineering injection framework with controlled experiments on high-fidelity website clones, allowing for further benchmark expansion.
Problem

Research questions and friction points this paper is trying to address.

Evaluates susceptibility of web agents to prompt injection attacks
Measures how persuasion techniques misguide autonomous web agents
Analyzes systemic vulnerabilities in web-based agents using realistic tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces TRAP benchmark for web agent persuasion
Tests six frontier models against prompt injection attacks
Provides modular social-engineering injection framework
🔎 Similar Papers
No similar papers found.