🤖 AI Summary
This work reveals that large language model (LLM)-driven web agents are vulnerable to psychology-informed, interface-based implicit prompt injection attacks in dynamic web environments, leading to task deviation. To systematically study this threat, the authors introduce TRAP—the first benchmark tailored to realistic web tasks—and propose a modular, high-fidelity website cloning framework that enables controllable modeling and evaluation of social-engineering-style attacks. Experiments across six state-of-the-art models—including GPT-5 and DeepSeek-R1—demonstrate that, on average, 25% of tasks are successfully redirected (ranging from 13% for GPT-5 to 43% for DeepSeek-R1), with minor interface modifications doubling attack success rates. This study is the first to systematically identify, formalize, and quantify the structural vulnerability of web agents to psychological persuasion mechanisms. It provides both theoretical foundations and an empirical benchmark for designing robust, human-aware web automation systems.
📝 Abstract
Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task. We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), an evaluation for studying how persuasion techniques misguide autonomous web agents on realistic tasks. Across six frontier models, agents are susceptible to prompt injection in 25% of tasks on average (13% for GPT-5 to 43% for DeepSeek-R1), with small interface or contextual changes often doubling success rates and revealing systemic, psychologically driven vulnerabilities in web-based agents. We also provide a modular social-engineering injection framework with controlled experiments on high-fidelity website clones, allowing for further benchmark expansion.