Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Ambiguous user instructions in GUI automation frequently lead to task failures. Method: This paper proposes a self-correcting GUI navigation paradigm with interactive information completion, wherein an agent proactively initiates in-situ follow-up questions during execution to clarify user intent. Contribution/Results: We formally define the “self-correcting GUI navigation” task, introduce Navi-plus—the first navigation-oriented dataset featuring interface-aware question-answer pairs—and design a dual-stream trajectory evaluation framework that jointly models visual states, action sequences, and multi-turn dialogues. Experiments demonstrate that agents equipped with follow-up questioning restore task success rates under ambiguous instructions to levels comparable with those achieved under unambiguous instructions, significantly outperforming conventional one-step execution paradigms. This work establishes a novel human-AI collaborative decision-making framework for GUI agents.

Technology Category

Application Category

📝 Abstract

Graphical user interfaces (GUI) automation agents are emerging as powerful tools, enabling humans to accomplish increasingly complex tasks on smart devices. However, users often inadvertently omit key information when conveying tasks, which hinders agent performance in the current agent paradigm that does not support immediate user intervention. To address this issue, we introduce a $ extbf{Self-Correction GUI Navigation}$ task that incorporates interactive information completion capabilities within GUI agents. We developed the $ extbf{Navi-plus}$ dataset with GUI follow-up question-answer pairs, alongside a $ extbf{Dual-Stream Trajectory Evaluation}$ method to benchmark this new capability. Our results show that agents equipped with the ability to ask GUI follow-up questions can fully recover their performance when faced with ambiguous user tasks.

Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguous GUI navigation tasks with missing user information

Introduces interactive information completion in GUI automation agents

Evaluates agents' ability to recover performance via follow-up questions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Correction GUI Navigation for task completion

Navi-plus dataset with follow-up QA pairs

Dual-Stream Trajectory Evaluation method

🔎 Similar Papers

Identifying User Goals from UI Trajectories