MobileSteward: Integrating Multiple App-Oriented Agents with Self-Evolution to Automate Cross-App Instructions

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current mobile agents face significant challenges in cross-application instruction execution, including complex task dependencies, heterogeneous environments, error propagation, and information loss. To address these issues, we propose a self-evolving multi-agent framework for multi-APP collaboration, featuring a novel App-centric architecture comprising decentralized StaffAgents and a centralized StewardAgent. Our method introduces a three-phase mechanism—dynamic recruitment, assignment-based execution, and adaptive evaluation—and integrates object-oriented agent modeling, information-flow-driven scheduling graphs, App-specific fine-tuned models, reflective evaluation feedback, and incremental retrospective updates of experience memory. Evaluated on CAPBench—the first real-world English cross-APP benchmark—we achieve substantial improvements over both single- and multi-agent baselines, with up to a 37.2% increase in complex instruction completion rate, demonstrating strong effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract
Mobile phone agents can assist people in automating daily tasks on their phones, which have emerged as a pivotal research spotlight. However, existing procedure-oriented agents struggle with cross-app instructions, due to the following challenges: (1) complex task relationships, (2) diverse app environment, and (3) error propagation and information loss in multi-step execution. Drawing inspiration from object-oriented programming principles, we recognize that object-oriented solutions is more suitable for cross-app instruction. To address these challenges, we propose a self-evolving multi-agent framework named MobileSteward, which integrates multiple app-oriented StaffAgents coordinated by a centralized StewardAgent. We design three specialized modules in MobileSteward: (1) Dynamic Recruitment generates a scheduling graph guided by information flow to explicitly associate tasks among apps. (2) Assigned Execution assigns the task to app-oriented StaffAgents, each equipped with app-specialized expertise to address the diversity between apps. (3) Adjusted Evaluation conducts evaluation to provide reflection tips or deliver key information, which alleviates error propagation and information loss during multi-step execution. To continuously improve the performance of MobileSteward, we develop a Memory-based Self-evolution mechanism, which summarizes the experience from successful execution, to improve the performance of MobileSteward. We establish the first English Cross-APP Benchmark (CAPBench) in the real-world environment to evaluate the agents' capabilities of solving complex cross-app instructions. Experimental results demonstrate that MobileSteward achieves the best performance compared to both single-agent and multi-agent frameworks, highlighting the superiority of MobileSteward in better handling user instructions with diverse complexity.
Problem

Research questions and friction points this paper is trying to address.

Automating cross-app mobile tasks
Addressing error propagation in multi-step execution
Handling diverse app environments effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-evolving multi-agent framework
Dynamic recruitment scheduling graph
Memory-based self-evolution mechanism
🔎 Similar Papers
No similar papers found.
Y
Yuxuan Liu
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China; Xiaomi AI Lab, Beijing, China
Hongda Sun
Hongda Sun
Renmin University of China
Natural Language ProcessingLarge Language ModelsAI for Healthcare
W
Wei Liu
Xiaomi AI Lab, Beijing, China
Jian Luan
Jian Luan
Toshiba, Microsoft, Xiaomi
LLMVLMTTSSinging Synthesis
Bo Du
Bo Du
Department of Management, Griffith Business School
Sustainable TransportTravel BehaviourUrban Data AnalyticsLogistics and Supply Chain
R
Rui Yan
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China; School of Computer Science, Wuhan University, Wuhan, China