MaskClaw: Edge-Side Personalized Privacy Arbitration for GUI Agents with Behavior-Driven Skill Evolution

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the privacy risks inherent in GUI agents that rely on screen captures, which may inadvertently expose sensitive information such as messages or medical records. Existing approaches struggle to dynamically adapt to varying tasks, user roles, and application states, and often compromise privacy by transmitting raw screenshots to the cloud for processing. To overcome these limitations, we propose an edge-based privacy arbitration mechanism that, before screenshots leave the trusted environment, fuses local visual features with user- and task-specific policies to dynamically decide whether to allow, redact, or prompt for confirmation. This framework uniquely enables personalized privacy adjudication and skill evolution directly on the edge device, eliminating the need to upload raw images. Evaluation on our newly introduced P-GUI-Evo benchmark demonstrates that our method significantly reduces excessive confirmations, over-redaction, and privacy leaks compared to pattern-matching baselines and cloud-based inference.

📝 Abstract

GUI agents rely on screenshots to infer intent and operate across applications, but these screenshots often contain private messages, medical records, payment credentials, and workplace-specific workflows. Privacy decisions in this setting depend on task, recipient, application state, and user role, yet static PII detectors miss these boundaries and cloud-side VLM reasoning can upload the raw screen before deciding what should be protected. We present MaskClaw, an edge-side privacy arbitrator for GUI agents. MaskClaw extracts local visual evidence, retrieves user- and task-specific policy memory, and decides Allow, Mask, or Ask before raw screenshots leave a trusted user- or organization-controlled environment. In five designed skill-evolution scenarios, it turns corrections, cancellations, and edits into reusable privacy skills checked by a sandbox gate. We introduce P-GUI-Evo, a benchmark built from real UI patterns, reconstructed HTML screens, and sanitized labels. Experiments show that pattern matching, cloud reasoning, and routing alone tend to over-confirm, over-mask, or expose raw screenshots under the same protocol. The artifact is available at https://github.com/Theodora-Y/MaskClaw.

Problem

Research questions and friction points this paper is trying to address.

GUI agents

privacy arbitration

screenshot privacy

PII detection

edge-side processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

edge-side privacy arbitration

behavior-driven skill evolution

GUI agents