🤖 AI Summary
This work addresses the critical privacy risks posed by GUI agents in automation, which often inadvertently leak sensitive information through uploaded interface screenshots, compounded by the lack of systematic approaches to identify and protect privacy across diverse interaction trajectories. To tackle this challenge, we propose GUIGuard—the first end-to-end privacy-preserving framework specifically designed for GUI agents—comprising three integrated stages: privacy identification, protection, and task execution under privacy constraints. We further introduce GUIGuard-Bench, a cross-platform benchmark encompassing 630 interaction trajectories and 13,830 region-level privacy-annotated screenshots. Experimental results reveal that existing agents exhibit alarmingly low privacy recognition accuracy (13.3% on Android and 1.4% on PC), whereas GUIGuard effectively masks sensitive content while preserving task semantics, demonstrating that robust privacy protection can be achieved without compromising task performance.
📝 Abstract
GUI agents enable end-to-end automation through direct perception of and interaction with on-screen interfaces. However, these agents frequently access interfaces containing sensitive personal information, and screenshots are often transmitted to remote models, creating substantial privacy risks. These risks are particularly severe in GUI workflows: GUIs expose richer, more accessible private information, and privacy risks depend on interaction trajectories across sequential scenes. We propose GUIGuard, a three-stage framework for privacy-preserving GUI agents: (1) privacy recognition, (2) privacy protection, and (3) task execution under protection. We further construct GUIGuard-Bench, a cross-platform benchmark with 630 trajectories and 13,830 screenshots, annotated with region-level privacy grounding and fine-grained labels of risk level, privacy category, and task necessity. Evaluations reveal that existing agents exhibit limited privacy recognition, with state-of-the-art models achieving only 13.3% accuracy on Android and 1.4% on PC. Under privacy protection, task-planning semantics can still be maintained, with closed-source models showing stronger semantic consistency than open-source ones. Case studies on MobileWorld show that carefully designed protection strategies achieve higher task accuracy while preserving privacy. Our results highlight privacy recognition as a critical bottleneck for practical GUI agents. Project: https://futuresis.github.io/GUIGuard-page/