SlowBA: An efficiency backdoor attack towards VLM-based GUI agents

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical yet overlooked security vulnerability in vision-language model (VLM)-driven GUI agents: the safety risks arising from response latency, which existing research has largely neglected in favor of action accuracy. We propose SlowBA, a novel backdoor attack that exploits this latency-based vulnerability for the first time. SlowBA leverages naturally occurring pop-up windows as stealthy triggers and employs a two-stage reward-based injection (RBI) strategy combined with reinforcement learning to achieve trigger-aware activation. This approach significantly prolongs the agent’s reasoning chain and response delay while preserving task accuracy. Extensive experiments demonstrate that SlowBA remains highly effective even under low poisoning rates and against various state-of-the-art defense mechanisms.

Technology Category

Application Category

📝 Abstract
Modern vision-language-model (VLM) based graphical user interface (GUI) agents are expected not only to execute actions accurately but also to respond to user instructions with low latency. While existing research on GUI-agent security mainly focuses on manipulating action correctness, the security risks related to response efficiency remain largely unexplored. In this paper, we introduce SlowBA, a novel backdoor attack that targets the responsiveness of VLM-based GUI agents. The key idea is to manipulate response latency by inducing excessively long reasoning chains under specific trigger patterns. To achieve this, we propose a two-stage reward-level backdoor injection (RBI) strategy that first aligns the long-response format and then learns trigger-aware activation through reinforcement learning. In addition, we design realistic pop-up windows as triggers that naturally appear in GUI environments, improving the stealthiness of the attack. Extensive experiments across multiple datasets and baselines demonstrate that SlowBA can significantly increase response length and latency while largely preserving task accuracy. The attack remains effective even with a small poisoning ratio and under several defense settings. These findings reveal a previously overlooked security vulnerability in GUI agents and highlight the need for defenses that consider both action correctness and response efficiency. Code can be found in https://github.com/tu-tuing/SlowBA.
Problem

Research questions and friction points this paper is trying to address.

backdoor attack
response efficiency
VLM-based GUI agents
latency manipulation
security vulnerability
Innovation

Methods, ideas, or system contributions that make the work stand out.

backdoor attack
response latency
vision-language model
GUI agent
reinforcement learning
🔎 Similar Papers
No similar papers found.
Junxian Li
Junxian Li
NSEC lab,Shanghai Jiaotong University
AI securityReasoningData Mining
T
Tu Lan
Shanghai Jiao Tong University, China
H
Haozhen Tan
Shanghai Jiao Tong University, China
Yan Meng
Yan Meng
Computer Science & Engineering Department, Shanghai Jiao Tong University
Network SecurityIoT SecurityIoT Security and Privacy
H
Haojin Zhu
Shanghai Jiao Tong University, China