Agent+P: Guiding UI Agents via Symbolic Planning

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

LLM-driven UI agents frequently exhibit hallucinations in long-horizon tasks due to insufficient understanding of global UI state evolution. To address this, we propose AGNET+P—a novel framework that introduces symbolic planning into UI agent navigation for the first time. It models mobile applications as UI Transition Graphs (UTGs) and formalizes task planning as an optimal path search problem over the graph. Leveraging off-the-shelf symbolic planners, AGNET+P generates verifiable, provably optimal high-level plans that guide the LLM in executing low-level actions, thereby enabling tight coordination between high-level reasoning and low-level perception. The framework is plug-and-play and requires no LLM fine-tuning. Evaluated on the AndroidWorld benchmark, AGNET+P improves the success rate of state-of-the-art UI agents by up to 14% and reduces average action steps by 37.7%, significantly enhancing both accuracy and efficiency for long-horizon UI automation tasks.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM)-based UI agents show great promise for UI automation but often hallucinate in long-horizon tasks due to their lack of understanding of the global UI transition structure. To address this, we introduce AGENT+P, a novel framework that leverages symbolic planning to guide LLM-based UI agents. Specifically, we model an app's UI transition structure as a UI Transition Graph (UTG), which allows us to reformulate the UI automation task as a pathfinding problem on the UTG. This further enables an off-the-shelf symbolic planner to generate a provably correct and optimal high-level plan, preventing the agent from redundant exploration and guiding the agent to achieve the automation goals. AGENT+P is designed as a plug-and-play framework to enhance existing UI agents. Evaluation on the AndroidWorld benchmark demonstrates that AGENT+P improves the success rates of state-of-the-art UI agents by up to 14% and reduces the action steps by 37.7%.

Problem

Research questions and friction points this paper is trying to address.

Addressing UI agent hallucinations in long-horizon tasks

Modeling UI transitions as a graph pathfinding problem

Guiding agents with symbolic planning for optimal automation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging symbolic planning to guide UI agents

Modeling UI transition structure as a graph

Reformulating automation as pathfinding problem

🔎 Similar Papers

No similar papers found.