π€ AI Summary
This work addresses the limitations of existing prompt injection attacks against browser-based agents, which often lack stealth and compromise system usability during long-horizon tasks. The authors propose WebTrap, a novel mid-task hijacking approach that seamlessly embeds malicious objectives into the userβs original workflow through multi-step instruction fusion and context alignment, automatically resuming the legitimate task post-attack. By innovatively co-binding adversarial and user goals, WebTrap maintains system availability while substantially increasing attack success rates. Experimental results on extended WASP and InjecAgent benchmarks demonstrate that WebTrap effectively evades current defense mechanisms, exposing critical security vulnerabilities in long-horizon autonomous agent navigation.
π Abstract
Browser agents are increasingly deployed in long-horizon tasks, which require executing extended action chains to accomplish user goals. However, this prolonged execution process provides attackers with more opportunities to inject malicious instructions. Existing prompt injection attacks against browser agents expose two key gaps: (1) low effectiveness, as attacks optimized for toy baselines fail to achieve end-to-end goals in real-world scenarios with complex environments and longer steps; (2) weak stealthiness, since most attacks pit the attack goal against the user goal, causing a significant drop in system usability under attack. To address these gaps, we propose WebTrap, a mid-task hijacking injection attack. It employs multi-step instruction fusion steering to seamlessly combine both goals, enabling the agent to resume the original user task after executing the attack goal. Furthermore, we design a context-grounded generation method to align the injected content with the task environment and system instructions, maximizing the hijacking success rate. Extensive experiments on two browser agent tasks, based on extended WASP and InjecAgent environments, demonstrate that our method achieves a high attack success rate while preserving the usability of the original system. We find that WebTrap exploits the agent's navigation vulnerabilities, binding the two goals so tightly that standard defense mechanisms cannot restore the system to normal operation. These findings reveal a critical vulnerability in agent systems during long-horizon tasks that they can be stealthily hijacked.