Navi-plus: Managing Ambiguous GUI Navigation Tasks with Follow-up

📅 2025-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Ambiguous user instructions in GUI automation frequently lead to task failures. Method: This paper proposes a self-correcting GUI navigation paradigm with interactive information completion, wherein an agent proactively initiates in-situ follow-up questions during execution to clarify user intent. Contribution/Results: We formally define the “self-correcting GUI navigation” task, introduce Navi-plus—the first navigation-oriented dataset featuring interface-aware question-answer pairs—and design a dual-stream trajectory evaluation framework that jointly models visual states, action sequences, and multi-turn dialogues. Experiments demonstrate that agents equipped with follow-up questioning restore task success rates under ambiguous instructions to levels comparable with those achieved under unambiguous instructions, significantly outperforming conventional one-step execution paradigms. This work establishes a novel human-AI collaborative decision-making framework for GUI agents.

Technology Category

Application Category

📝 Abstract
Graphical user interfaces (GUI) automation agents are emerging as powerful tools, enabling humans to accomplish increasingly complex tasks on smart devices. However, users often inadvertently omit key information when conveying tasks, which hinders agent performance in the current agent paradigm that does not support immediate user intervention. To address this issue, we introduce a $ extbf{Self-Correction GUI Navigation}$ task that incorporates interactive information completion capabilities within GUI agents. We developed the $ extbf{Navi-plus}$ dataset with GUI follow-up question-answer pairs, alongside a $ extbf{Dual-Stream Trajectory Evaluation}$ method to benchmark this new capability. Our results show that agents equipped with the ability to ask GUI follow-up questions can fully recover their performance when faced with ambiguous user tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguous GUI navigation tasks with missing user information
Introduces interactive information completion in GUI automation agents
Evaluates agents' ability to recover performance via follow-up questions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Correction GUI Navigation for task completion
Navi-plus dataset with follow-up QA pairs
Dual-Stream Trajectory Evaluation method
🔎 Similar Papers
No similar papers found.
Ziming Cheng
Ziming Cheng
National University of Singapore, BUPT, SenseTime
Multimodel-LLMWeb Agent3D Human Pose Estimation
Z
Zhiyuan Huang
SenseTime Research
J
Junting Pan
MMLab, CUHK
Z
Zhaohui Hou
SenseTime Research
M
Mingjie Zhan
SenseTime Research