The Obvious Invisible Threat: LLM-Powered GUI Agents' Vulnerability to Fine-Print Injections

📅 2025-04-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies a novel class of security threats—fine-grained text injection attacks—targeting LLM-driven GUI agents interacting with web interfaces. Such attacks embed semantically critical yet visually imperceptible text, exploiting discrepancies between agent and human perception of visual salience and judgment of contextual completeness to induce erroneous actions or sensitive data leakage. We systematically characterize six distinct attack variants, leveraging multimodal LLM-based action planning, adversarial web page construction, and human-agent collaborative evaluation. Experiments across six state-of-the-art agents, 234 adversarial web pages, and 39 human participants confirm the attacks’ widespread prevalence and high risk. To mitigate this threat, we propose a privacy-first agent design paradigm incorporating contextual completeness verification and a lightweight defense module. Our approach significantly enhances robustness against such injections and is directly deployable without architectural overhaul.

Technology Category

Application Category

📝 Abstract
A Large Language Model (LLM) powered GUI agent is a specialized autonomous system that performs tasks on the user's behalf according to high-level instructions. It does so by perceiving and interpreting the graphical user interfaces (GUIs) of relevant apps, often visually, inferring necessary sequences of actions, and then interacting with GUIs by executing the actions such as clicking, typing, and tapping. To complete real-world tasks, such as filling forms or booking services, GUI agents often need to process and act on sensitive user data. However, this autonomy introduces new privacy and security risks. Adversaries can inject malicious content into the GUIs that alters agent behaviors or induces unintended disclosures of private information. These attacks often exploit the discrepancy between visual saliency for agents and human users, or the agent's limited ability to detect violations of contextual integrity in task automation. In this paper, we characterized six types of such attacks, and conducted an experimental study to test these attacks with six state-of-the-art GUI agents, 234 adversarial webpages, and 39 human participants. Our findings suggest that GUI agents are highly vulnerable, particularly to contextually embedded threats. Moreover, human users are also susceptible to many of these attacks, indicating that simple human oversight may not reliably prevent failures. This misalignment highlights the need for privacy-aware agent design. We propose practical defense strategies to inform the development of safer and more reliable GUI agents.
Problem

Research questions and friction points this paper is trying to address.

LLM-powered GUI agents vulnerable to fine-print injections
Adversarial GUI manipulations risk privacy and security breaches
Human oversight fails to reliably prevent agent vulnerabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-powered GUI agents interpret and interact with GUIs
Detect and prevent fine-print injection attacks
Privacy-aware design for safer GUI agents
🔎 Similar Papers
No similar papers found.