Executable Agentic Memory for GUI Agent

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

219K/year
🤖 AI Summary
This work addresses the fragility of conventional GUI agents in long-horizon tasks, where repeated interface re-interpretation at each step degrades performance. The authors propose Executable Agent Memory (EAM), which shifts task planning from free-form generation to retrieval and execution over a structured knowledge graph. EAM compresses multi-step interactions via state-aware depth-first search and action-sequence mining, while a lightweight Q-function guides Monte Carlo Tree Search for efficient planning. Theoretical analysis establishes bias consistency of the Q-model and provides sample complexity bounds for path recovery. Evaluated on AndroidWorld, EAM achieves up to a 19.6% absolute improvement in success rate over UI-TARS-7B, reduces inference token consumption by 6×, and maintains an average latency of only 2.8 seconds.
📝 Abstract
Modern GUI agents typically rely on a model-centric and step-wise interaction paradigm, where LLMs must re-interpret the UI and re-decide actions at every screen, which is fragile in long-horizon tasks. In this paper, we propose Executable Agentic Memory (EAM), a structured Knowledge Graph (KG) that shifts GUI planning from free-form generation to a robust retrieval-and-execution process. Our approach includes a sample-efficient memory construction pipeline using state-aware DFS and action-group mining to compress multi-step routines. To ensure efficient planning, we introduce a value-guided graph search where a lightweight Q-function model steers Monte Carlo Tree Search (MCTS) over the KG. We theoretically establish bias-consistency for the Q-model and derive sample complexity bounds for path recovery. Empirically, EAM outperforms state-of-the-art baselines like UI-TARS-7B by up to $19.6\%$ on AndroidWorld, while reducing token costs $6\times$ relative to GPT-4o. With a $2.8$s average latency, EAM enables reliable, quick, and long-horizon GUI automation.
Problem

Research questions and friction points this paper is trying to address.

GUI agent
long-horizon tasks
executable memory
action planning
user interface automation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Executable Agentic Memory
Knowledge Graph
Monte Carlo Tree Search
GUI Automation
Sample-efficient Memory Construction
🔎 Similar Papers
No similar papers found.