Less is More: Empowering GUI Agent with Context-Aware Simplification

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

GUI agents transitioning to pure vision-based paradigms face two key challenges: (1) highly dense yet loosely correlated element contexts, and (2) severe redundancy in historical interactions—both impeding efficient context modeling. To address these, we propose SimpAgent, a context-aware simplification framework. First, it introduces a mask-driven element pruning mechanism that suppresses irrelevant visual interference without explicitly modeling complex inter-element relationships. Second, it incorporates a consistency-guided history compression module that explicitly enforces implicit, compact encoding of historical interactions by the large vision-language model. SimpAgent adopts an end-to-end purely visual architecture and achieves state-of-the-art performance across diverse web and mobile navigation benchmarks. It reduces FLOPs by 27% while improving accuracy and generalization—demonstrating unified gains in effectiveness, efficiency, and robustness.

Technology Category

Application Category

📝 Abstract

The research focus of GUI agents is shifting from text-dependent to pure-vision-based approaches, which, though promising, prioritize comprehensive pre-training data collection while neglecting contextual modeling challenges. We probe the characteristics of element and history contextual modeling in GUI agent and summarize: 1) the high-density and loose-relation of element context highlight the existence of many unrelated elements and their negative influence; 2) the high redundancy of history context reveals the inefficient history modeling in current GUI agents. In this work, we propose a context-aware simplification framework for building an efficient and effective GUI Agent, termed SimpAgent. To mitigate potential interference from numerous unrelated elements, we introduce a masking-based element pruning method that circumvents the intractable relation modeling through an efficient masking mechanism. To reduce the redundancy in historical information, we devise a consistency-guided history compression module, which enhances implicit LLM-based compression through innovative explicit guidance, achieving an optimal balance between performance and efficiency. With the above components, SimpAgent reduces 27% FLOPs and achieves superior GUI navigation performances. Comprehensive navigation experiments across diverse web and mobile environments demonstrate the effectiveness and potential of our agent.

Problem

Research questions and friction points this paper is trying to address.

Addressing high-density unrelated elements in GUI context modeling

Reducing redundancy in GUI agent history context modeling

Improving efficiency and performance in GUI navigation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Masking-based element pruning method

Consistency-guided history compression module

Context-aware simplification framework

🔎 Similar Papers

No similar papers found.

Authors to Follow