š¤ AI Summary
LLM-driven UI agents frequently exhibit hallucinations in long-horizon tasks due to insufficient understanding of global UI state evolution. To address this, we propose AGNET+Pāa novel framework that introduces symbolic planning into UI agent navigation for the first time. It models mobile applications as UI Transition Graphs (UTGs) and formalizes task planning as an optimal path search problem over the graph. Leveraging off-the-shelf symbolic planners, AGNET+P generates verifiable, provably optimal high-level plans that guide the LLM in executing low-level actions, thereby enabling tight coordination between high-level reasoning and low-level perception. The framework is plug-and-play and requires no LLM fine-tuning. Evaluated on the AndroidWorld benchmark, AGNET+P improves the success rate of state-of-the-art UI agents by up to 14% and reduces average action steps by 37.7%, significantly enhancing both accuracy and efficiency for long-horizon UI automation tasks.
š Abstract
Large Language Model (LLM)-based UI agents show great promise for UI automation but often hallucinate in long-horizon tasks due to their lack of understanding of the global UI transition structure. To address this, we introduce AGENT+P, a novel framework that leverages symbolic planning to guide LLM-based UI agents. Specifically, we model an app's UI transition structure as a UI Transition Graph (UTG), which allows us to reformulate the UI automation task as a pathfinding problem on the UTG. This further enables an off-the-shelf symbolic planner to generate a provably correct and optimal high-level plan, preventing the agent from redundant exploration and guiding the agent to achieve the automation goals. AGENT+P is designed as a plug-and-play framework to enhance existing UI agents. Evaluation on the AndroidWorld benchmark demonstrates that AGENT+P improves the success rates of state-of-the-art UI agents by up to 14% and reduces the action steps by 37.7%.