EnvInjection: Environmental Prompt Injection Attack to Multi-modal Web Agents

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Multimodal large language model (MLLM)-driven web agents are vulnerable to environment prompt injection attacks, yet existing defenses suffer from limitations in effectiveness, stealth, or practical feasibility. Method: We propose EnvInjection, an end-to-end learnable environmental perturbation attack framework. Its core innovation is the first differentiable modeling of the inherently non-differentiable web rendering process via a trainable neural agent, coupled with pixel-level adversarial perturbation optimization using projected gradient descent—enabling precise triggering of malicious actions while preserving visual imperceptibility. Crucially, EnvInjection requires no modification to webpage logic; it operates solely by fine-tuning rendered image pixels and supports source-code-level deployment. Results: Extensive experiments across multiple web benchmarks demonstrate that EnvInjection significantly improves attack success rates, achieving high stealth, strong practicality, and cross-model generalizability.

Technology Category

Application Category

📝 Abstract

Multi-modal large language model (MLLM)-based web agents interact with webpage environments by generating actions based on screenshots of the webpages. Environmental prompt injection attacks manipulate the environment to induce the web agent to perform a specific, attacker-chosen action--referred to as the target action. However, existing attacks suffer from limited effectiveness or stealthiness, or are impractical in real-world settings. In this work, we propose EnvInjection, a new attack that addresses these limitations. Our attack adds a perturbation to the raw pixel values of the rendered webpage, which can be implemented by modifying the webpage's source code. After these perturbed pixels are mapped into a screenshot, the perturbation induces the web agent to perform the target action. We formulate the task of finding the perturbation as an optimization problem. A key challenge in solving this problem is that the mapping between raw pixel values and screenshot is non-differentiable, making it difficult to backpropagate gradients to the perturbation. To overcome this, we train a neural network to approximate the mapping and apply projected gradient descent to solve the reformulated optimization problem. Extensive evaluation on multiple webpage datasets shows that EnvInjection is highly effective and significantly outperforms existing baselines.

Problem

Research questions and friction points this paper is trying to address.

Attack manipulates web agents via environmental prompts

Overcomes limitations in effectiveness and stealthiness

Uses pixel perturbation to induce target actions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Perturbs raw pixel values of webpages

Trains neural network to approximate mapping

Uses projected gradient descent for optimization

🔎 Similar Papers

EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage

2024-09-17arXiv.orgCitations: 4

Caution for the Environment: Multimodal Agents are Susceptible to Environmental Distractions

2024-08-05arXiv.orgCitations: 26

Anthropic

$500,000—$850,000 USD

San Francisco, CA, USA

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)