AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This study addresses the lack of interaction mechanisms in existing mobile GUI agents that simultaneously support transparency and multitasking. The authors propose a hybrid visual interaction model featuring an adaptive visual modality switching mechanism—the first of its kind for mobile GUI agents—that dynamically selects among Full UI, Partial UI, or GenUI visualization modalities based on task characteristics and user preferences. Leveraging Virtual Display technology, the system enables selective visual overlay during background execution. User studies demonstrate substantial improvements in human-agent interaction: 85.7% of participants preferred the proposed approach, which also achieved the highest usability score (PSSUQ = 1.94) and strongest intention to adopt (6.43 out of 7).

Technology Category

Application Category

📝 Abstract

Mobile GUI agents can automate smartphone tasks by interacting directly with app interfaces, but how they should communicate with users during execution remains underexplored. Existing systems rely on two extremes: foreground execution, which maximizes transparency but prevents multitasking, and background execution, which supports multitasking but provides little visual awareness. Through iterative formative studies, we found that users prefer a hybrid model with just-in-time visual interaction, but the most effective visualization modality depends on the task. Motivated by this, we present AgentLens, a mobile GUI agent that adaptively uses three visual modalities during human-agent interaction: Full UI, Partial UI, and GenUI. AgentLens extends a standard mobile agent with adaptive communication actions and uses Virtual Display to enable background execution with selective visual overlays. In a controlled study with 21 participants, AgentLens was preferred by 85.7% of participants and achieved the highest usability (1.94 Overall PSSUQ) and adoption-intent (6.43/7).

Problem

Research questions and friction points this paper is trying to address.

mobile GUI agents

human-agent interaction

visual modalities

adaptive communication

user awareness

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive visual modalities

mobile GUI agents

human-agent interaction