🤖 AI Summary
Long-horizon mobile GUI automation suffers from context overload due to unbounded growth of interaction history, while existing compression methods struggle to jointly preserve semantic fidelity and adapt to environmental dynamics. This paper proposes a program-guided dynamic context management paradigm: GUI interactions are modeled as executable programs, and a global belief state mechanism—inspired by Belief MDPs—is introduced to enable structured semantic retention and robust response to environmental mutations. Context compression is achieved through program abstraction, variable-aware parsing, control-flow-driven pruning, and belief-state updating. Evaluated on AndroidWorld and a custom long-cycle task benchmark, our approach achieves state-of-the-art success rates and effectively mitigates the severe performance degradation observed in baseline models on extended-horizon tasks.
📝 Abstract
The rapid development of mobile GUI agents has stimulated growing research interest in long-horizon task automation. However, building agents for these tasks faces a critical bottleneck: the reliance on ever-expanding interaction history incurs substantial context overhead. Existing context management and compression techniques often fail to preserve vital semantic information, leading to degraded task performance. We propose AgentProg, a program-guided approach for agent context management that reframes the interaction history as a program with variables and control flow. By organizing information according to the structure of program, this structure provides a principled mechanism to determine which information should be retained and which can be discarded. We further integrate a global belief state mechanism inspired by Belief MDP framework to handle partial observability and adapt to unexpected environmental changes. Experiments on AndroidWorld and our extended long-horizon task suite demonstrate that AgentProg has achieved the state-of-the-art success rates on these benchmarks. More importantly, it maintains robust performance on long-horizon tasks while baseline methods experience catastrophic degradation. Our system is open-sourced at https://github.com/MobileLLM/AgentProg.