🤖 AI Summary
GUI agents face challenges in sequential navigation tasks due to redundant historical context and high computational overhead. To address this, we propose HCPO (History-Aware Contextual Policy Optimization), a novel framework that jointly integrates dynamic context sampling with anchor-guided history compression. Specifically, action anchors enable precise retention and structured modeling of critical historical information. Methodologically, HCPO introduces dynamic graph-structured sampling, a dual-branch policy network, and a history-enhanced alignment loss to significantly improve historical utilization efficiency. Experiments demonstrate that HiconAgent-3B—built upon HCPO—achieves state-of-the-art performance on GUI-Odyssey, outperforming GUI-R1-7B by 8.46% in task accuracy and 11.32% in step success rate. On AndroidControl and AITW, it matches or exceeds prior methods while accelerating inference by 2.47× and reducing computational cost by 60%.
📝 Abstract
Graphical User Interface (GUI) agents require effective use of historical context to perform sequential navigation tasks. While incorporating past actions and observations can improve decision making, naive use of full history leads to excessive computational overhead and distraction from irrelevant information. To address this, we introduce HiconAgent, a GUI agent trained with History Context-aware Policy Optimization (HCPO) for efficient and effective utilization of historical information. HCPO optimizes history usage in both sampling and policy updates through two complementary components: (1) Dynamic Context Sampling (DCS) presents the agent with variable length histories during sampling, enabling adaptive use of the most relevant context; (2) Anchor-guided History Compression (AHC) refines the policy update phase with a dual branch strategy where the compressed branch removes history observations while keeping history actions as information flow anchors. The compressed and uncompressed branches are coupled through a history-enhanced alignment loss to enforce consistent history usage while maintaining efficiency. Experiments on mainstream GUI navigation benchmarks demonstrate strong performance. Despite being smaller, HiconAgent-3B outperforms GUI-R1-7B by +8.46 percent grounding accuracy and +11.32 percent step success rate on GUI-Odyssey, while achieving comparable results on AndroidControl and AITW with up to 2.47x computational speedup and 60 percent FLOPs reduction.