SlowBA: An efficiency backdoor attack towards VLM-based GUI agents

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses a critical yet overlooked security vulnerability in vision-language model (VLM)-driven GUI agents: the safety risks arising from response latency, which existing research has largely neglected in favor of action accuracy. We propose SlowBA, a novel backdoor attack that exploits this latency-based vulnerability for the first time. SlowBA leverages naturally occurring pop-up windows as stealthy triggers and employs a two-stage reward-based injection (RBI) strategy combined with reinforcement learning to achieve trigger-aware activation. This approach significantly prolongs the agent’s reasoning chain and response delay while preserving task accuracy. Extensive experiments demonstrate that SlowBA remains highly effective even under low poisoning rates and against various state-of-the-art defense mechanisms.

Technology Category

Application Category

📝 Abstract

Modern vision-language-model (VLM) based graphical user interface (GUI) agents are expected not only to execute actions accurately but also to respond to user instructions with low latency. While existing research on GUI-agent security mainly focuses on manipulating action correctness, the security risks related to response efficiency remain largely unexplored. In this paper, we introduce SlowBA, a novel backdoor attack that targets the responsiveness of VLM-based GUI agents. The key idea is to manipulate response latency by inducing excessively long reasoning chains under specific trigger patterns. To achieve this, we propose a two-stage reward-level backdoor injection (RBI) strategy that first aligns the long-response format and then learns trigger-aware activation through reinforcement learning. In addition, we design realistic pop-up windows as triggers that naturally appear in GUI environments, improving the stealthiness of the attack. Extensive experiments across multiple datasets and baselines demonstrate that SlowBA can significantly increase response length and latency while largely preserving task accuracy. The attack remains effective even with a small poisoning ratio and under several defense settings. These findings reveal a previously overlooked security vulnerability in GUI agents and highlight the need for defenses that consider both action correctness and response efficiency. Code can be found in https://github.com/tu-tuing/SlowBA.

Problem

Research questions and friction points this paper is trying to address.

backdoor attack

response efficiency

VLM-based GUI agents

latency manipulation

security vulnerability

Innovation

Methods, ideas, or system contributions that make the work stand out.

backdoor attack

response latency

vision-language model