Morae: Proactively Pausing UI Agents for User Choices

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Existing UI agents predominantly employ end-to-end automation, neglecting user involvement and contextual awareness at critical decision points—thereby undermining user autonomy (e.g., blind and low-vision (BLV) users remain unaware of equally priced but superior alternatives, such as higher-rated or better-tasting sparkling water options). Method: We propose a hybrid-proactivity interaction paradigm: for the first time, enabling UI agents to autonomously identify decision points—without explicit prompting—by jointly analyzing user queries, UI source code, and screenshots via large multimodal models, then pausing execution to solicit user preferences. Contribution/Results: Evaluated on real-world web tasks, our approach significantly improves task completion rate and option alignment fidelity. In BLV-user scenarios, it outperforms baseline systems including OpenAI Operator, effectively balancing automation efficiency with user agency and control.

Technology Category

Application Category

📝 Abstract

User interface (UI) agents promise to make inaccessible or complex UIs easier to access for blind and low-vision (BLV) users. However, current UI agents typically perform tasks end-to-end without involving users in critical choices or making them aware of important contextual information, thus reducing user agency. For example, in our field study, a BLV participant asked to buy the cheapest available sparkling water, and the agent automatically chose one from several equally priced options, without mentioning alternative products with different flavors or better ratings. To address this problem, we introduce Morae, a UI agent that automatically identifies decision points during task execution and pauses so that users can make choices. Morae uses large multimodal models to interpret user queries alongside UI code and screenshots, and prompt users for clarification when there is a choice to be made. In a study over real-world web tasks with BLV participants, Morae helped users complete more tasks and select options that better matched their preferences, as compared to baseline agents, including OpenAI Operator. More broadly, this work exemplifies a mixed-initiative approach in which users benefit from the automation of UI agents while being able to express their preferences.

Problem

Research questions and friction points this paper is trying to address.

UI agents reduce user agency by automating choices without consultation

Agents fail to present contextual information during critical decision points

BLV users need proactive pauses for preference-based selections

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pauses agent at decision points for user choices

Uses multimodal models to interpret queries and UI

Prompts user clarification during choice scenarios

🔎 Similar Papers

Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents