AppAgent-Pro: A Proactive GUI Agent System for Multidomain Information Integration and User Assistance

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current LLM-driven GUI agents predominantly operate in a passive, reactive manner, limiting their capability for general-purpose and efficient information acquisition. To address this, we propose the first GUI agent framework endowed with proactive reasoning capabilities. Our approach integrates three core components: (1) fine-grained GUI state understanding, (2) multi-step, context-aware planning, and (3) demand-prediction–triggered execution. This enables cross-application, cross-domain deep information integration and anticipatory task execution. Crucially, the framework transcends conventional passive paradigms by autonomously identifying latent user needs from high-level instructions and initiating targeted information retrieval without explicit step-by-step guidance. Empirical evaluation demonstrates substantial improvements in complex multi-step task completion efficiency (+38.2%) and user satisfaction. The implementation—including source code and demonstration videos—is publicly released.

Technology Category

Application Category

📝 Abstract
Large language model (LLM)-based agents have demonstrated remarkable capabilities in addressing complex tasks, thereby enabling more advanced information retrieval and supporting deeper, more sophisticated human information-seeking behaviors. However, most existing agents operate in a purely reactive manner, responding passively to user instructions, which significantly constrains their effectiveness and efficiency as general-purpose platforms for information acquisition. To overcome this limitation, this paper proposes AppAgent-Pro, a proactive GUI agent system that actively integrates multi-domain information based on user instructions. This approach enables the system to proactively anticipate users' underlying needs and conduct in-depth multi-domain information mining, thereby facilitating the acquisition of more comprehensive and intelligent information. AppAgent-Pro has the potential to fundamentally redefine information acquisition in daily life, leading to a profound impact on human society. Our code is available at: https://github.com/LaoKuiZe/AppAgent-Pro. Our code is available at: https://github.com/LaoKuiZe/AppAgent-Pro. The demonstration video could be found at: https://www.dropbox.com/scl/fi/hvzqo5vnusg66srydzixo/AppAgent-Pro-demo-video.mp4?rlkey=o2nlfqgq6ihl125mcqg7bpgqu&st=d29vrzii&dl=0.
Problem

Research questions and friction points this paper is trying to address.

Proactive GUI agent system for multi-domain information integration
Overcoming limitations of reactive agents in information acquisition
Anticipating user needs through intelligent information mining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proactive GUI agent system for multidomain integration
Anticipates user needs with multi-domain information mining
Enables comprehensive and intelligent information acquisition
🔎 Similar Papers
No similar papers found.
Yuyang Zhao
Yuyang Zhao
NVIDIA Research
Computer VisionGenerative AI
W
Wentao Shi
University of Science and Technology of China, Hefei, Anhui, China
F
Fuli Feng
University of Science and Technology of China, Hefei, Anhui, China
Xiangnan He
Xiangnan He
University of Science and Technology of China
RecommendationCausalityBig DataInformation RetrievalMachine Learning