Attacking Multimodal OS Agents with Malicious Image Patches

📅 2025-03-13

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

This work identifies a novel security threat—Malicious Image Patches (MIPs)—against multimodal operating system (OS) agents in screenshot-aware scenarios: imperceptible adversarial perturbations embedded in GUI screenshots mislead agents into invoking critical APIs, triggering unauthorized actions such as web navigation or command execution. To this end, the authors propose the first cross-task, cross-layout, and cross-agent universal attack framework targeting visual inputs, integrating adversarial example generation, GUI screenshot modeling, and API behavioral reverse engineering to construct an end-to-end MIP injection and triggering pipeline. Experiments demonstrate that the attack achieves an average success rate exceeding 92% across mainstream multimodal OS agents, exhibiting strong generalizability and high stealthiness. These results systematically expose a fundamental robustness gap in current multimodal OS agents: the absence of rigorous validation mechanisms for image-based inputs.

Technology Category

Application Category

📝 Abstract

Recent advances in operating system (OS) agents enable vision-language models to interact directly with the graphical user interface of an OS. These multimodal OS agents autonomously perform computer-based tasks in response to a single prompt via application programming interfaces (APIs). Such APIs typically support low-level operations, including mouse clicks, keyboard inputs, and screenshot captures. We introduce a novel attack vector: malicious image patches (MIPs) that have been adversarially perturbed so that, when captured in a screenshot, they cause an OS agent to perform harmful actions by exploiting specific APIs. For instance, MIPs embedded in desktop backgrounds or shared on social media can redirect an agent to a malicious website, enabling further exploitation. These MIPs generalise across different user requests and screen layouts, and remain effective for multiple OS agents. The existence of such attacks highlights critical security vulnerabilities in OS agents, which should be carefully addressed before their widespread adoption.

Problem

Research questions and friction points this paper is trying to address.

Exploits vulnerabilities in OS agents using malicious image patches.

Causes harmful actions via manipulated screenshots and APIs.

Highlights security risks in multimodal OS agent systems.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Malicious image patches exploit OS agents.

Adversarial perturbations trigger harmful API actions.

Attacks generalize across requests and layouts.

🔎 Similar Papers

Systematic Categorization, Construction and Evaluation of New Attacks against Multi-modal Mobile GUI Agents