MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Current mobile agents face significant challenges in cross-application instruction execution, including complex task dependencies, heterogeneous environments, error propagation, and information loss. To address these issues, we propose a self-evolving multi-agent framework for multi-APP collaboration, featuring a novel App-centric architecture comprising decentralized StaffAgents and a centralized StewardAgent. Our method introduces a three-phase mechanism—dynamic recruitment, assignment-based execution, and adaptive evaluation—and integrates object-oriented agent modeling, information-flow-driven scheduling graphs, App-specific fine-tuned models, reflective evaluation feedback, and incremental retrospective updates of experience memory. Evaluated on CAPBench—the first real-world English cross-APP benchmark—we achieve substantial improvements over both single- and multi-agent baselines, with up to a 37.2% increase in complex instruction completion rate, demonstrating strong effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract

Mobile phone agents can assist people in automating daily tasks on their phones, which have emerged as a pivotal research spotlight. However, existing procedure-oriented agents struggle with cross-app instructions, due to the following challenges: (1) complex task relationships, (2) diverse app environment, and (3) error propagation and information loss in multi-step execution. Drawing inspiration from object-oriented programming principles, we recognize that object-oriented solutions is more suitable for cross-app instruction. To address these challenges, we propose a self-evolving multi-agent framework named MobileSteward, which integrates multiple app-oriented StaffAgents coordinated by a centralized StewardAgent. We design three specialized modules in MobileSteward: (1) Dynamic Recruitment generates a scheduling graph guided by information flow to explicitly associate tasks among apps. (2) Assigned Execution assigns the task to app-oriented StaffAgents, each equipped with app-specialized expertise to address the diversity between apps. (3) Adjusted Evaluation conducts evaluation to provide reflection tips or deliver key information, which alleviates error propagation and information loss during multi-step execution. To continuously improve the performance of MobileSteward, we develop a Memory-based Self-evolution mechanism, which summarizes the experience from successful execution, to improve the performance of MobileSteward. We establish the first English Cross-APP Benchmark (CAPBench) in the real-world environment to evaluate the agents' capabilities of solving complex cross-app instructions. Experimental results demonstrate that MobileSteward achieves the best performance compared to both single-agent and multi-agent frameworks, highlighting the superiority of MobileSteward in better handling user instructions with diverse complexity.

Problem

Research questions and friction points this paper is trying to address.

Automating cross-app mobile tasks

Addressing error propagation in multi-step execution

Handling diverse app environments effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-evolving multi-agent framework

Dynamic recruitment scheduling graph

Memory-based self-evolution mechanism

🔎 Similar Papers

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices