AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Long-horizon mobile GUI automation suffers from context overload due to unbounded growth of interaction history, while existing compression methods struggle to jointly preserve semantic fidelity and adapt to environmental dynamics. This paper proposes a program-guided dynamic context management paradigm: GUI interactions are modeled as executable programs, and a global belief state mechanism—inspired by Belief MDPs—is introduced to enable structured semantic retention and robust response to environmental mutations. Context compression is achieved through program abstraction, variable-aware parsing, control-flow-driven pruning, and belief-state updating. Evaluated on AndroidWorld and a custom long-cycle task benchmark, our approach achieves state-of-the-art success rates and effectively mitigates the severe performance degradation observed in baseline models on extended-horizon tasks.

Technology Category

Application Category

📝 Abstract

The rapid development of mobile GUI agents has stimulated growing research interest in long-horizon task automation. However, building agents for these tasks faces a critical bottleneck: the reliance on ever-expanding interaction history incurs substantial context overhead. Existing context management and compression techniques often fail to preserve vital semantic information, leading to degraded task performance. We propose AgentProg, a program-guided approach for agent context management that reframes the interaction history as a program with variables and control flow. By organizing information according to the structure of program, this structure provides a principled mechanism to determine which information should be retained and which can be discarded. We further integrate a global belief state mechanism inspired by Belief MDP framework to handle partial observability and adapt to unexpected environmental changes. Experiments on AndroidWorld and our extended long-horizon task suite demonstrate that AgentProg has achieved the state-of-the-art success rates on these benchmarks. More importantly, it maintains robust performance on long-horizon tasks while baseline methods experience catastrophic degradation. Our system is open-sourced at https://github.com/MobileLLM/AgentProg.

Problem

Research questions and friction points this paper is trying to address.

Reduces context overhead in long-horizon GUI task automation

Preserves vital semantic information through program-guided context management

Handles partial observability and adapts to unexpected environmental changes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Program-guided context management with variables and control flow

Global belief state mechanism for partial observability handling

Principled information retention and discarding via program structure

🔎 Similar Papers

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots