MIRAGE: Context-Aware Prompt Injection against Mobile GUI Agents via User-Generated Content

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing mobile GUI agents struggle to distinguish trustworthy interface elements from user-generated content, rendering them vulnerable to prompt injection attacks. This work proposes a context-aware attack method that requires no modifications to the agent, application, or system: by embedding controllable text into user-editable regions, ordinary screenshots are transformed into adversarial examples. The approach employs a three-stage pipeline—region localization, context-aware payload generation, and realism filtering—to efficiently produce visually plausible attack samples. Evaluated across 1,111 samples spanning 10 applications and 11 attack intents, the method achieves attack success rates of 23%–30% against five vision-language model agents. Human evaluations rate the generated samples’ visual realism at 3.02/5, significantly outperforming the current state-of-the-art (2.52/5), while revealing no significant correlation between attack success rate and visual realism.
📝 Abstract
Mobile graphical user interface (GUI) agents driven by vision-language models (VLMs) perceive the screen as rendered pixels and choose actions from what they see, so they cannot reliably separate trusted interface elements from user-generated content. We present MIRAGE (Mobile Injection of Realistic Adversarial GUI Examples), a pipeline that turns benign mobile screenshots into prompt-injection samples by placing attacker-controlled text into ordinary user-generated content regions, without modifying the agent, the application, or the operating system. MIRAGE operates in three stages: a Localizer identifies user-controllable regions on the screenshot, a Generator synthesises context-aware payloads and renders them in the application's native style, and a Curator moderates realism and balances the samples across applications, region types, and attack intents. A key challenge is that an injected screenshot must stay visually indistinguishable from genuine user content while still diverting the agent; we address this by separating the stages that control reach, realism, and distributional balance. On a 1,111-sample benchmark spanning ten applications and eleven attack intents, all five evaluated VLM agents are vulnerable, with attack success rates of 23%-30%, and MIRAGE scores higher on human realism ratings than the strongest prior attack (3.02 versus 2.52 out of 5). We further find that per-sample realism and attack success are uncorrelated, so visual-quality filtering alone cannot reliably defend against this threat.
Problem

Research questions and friction points this paper is trying to address.

prompt injection
mobile GUI agents
user-generated content
vision-language models
adversarial attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt injection
vision-language models
mobile GUI agents
user-generated content
adversarial attacks