FedMABench: Benchmarking Mobile Agents on Decentralized Heterogeneous User Data

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Centralized training for mobile agents incurs high computational costs and poor scalability, while existing federated learning (FL) frameworks lack standardized benchmarks tailored to mobile environments. Method: This paper introduces FedMobile—the first FL benchmark specifically designed for mobile agents—establishing the first standardized framework for federated training and evaluation in mobile scenarios. It systematically characterizes the impact of cross-app task implicit correlations and application-level data distributions on statistical heterogeneity. Contribution/Results: FedMobile integrates eight FL algorithms, ten-plus foundational models, six benchmark datasets (comprising over 30 subsets), and 800+ real-world mobile apps spanning five major application categories. Empirical results demonstrate that federated training consistently outperforms local training. The platform open-sources all data, code, and evaluation tools, enabling reproducible, scalable research on mobile agents.

Technology Category

Application Category

📝 Abstract
Mobile agents have attracted tremendous research participation recently. Traditional approaches to mobile agent training rely on centralized data collection, leading to high cost and limited scalability. Distributed training utilizing federated learning offers an alternative by harnessing real-world user data, providing scalability and reducing costs. However, pivotal challenges, including the absence of standardized benchmarks, hinder progress in this field. To tackle the challenges, we introduce FedMABench, the first benchmark for federated training and evaluation of mobile agents, specifically designed for heterogeneous scenarios. FedMABench features 6 datasets with 30+ subsets, 8 federated algorithms, 10+ base models, and over 800 apps across 5 categories, providing a comprehensive framework for evaluating mobile agents across diverse environments. Through extensive experiments, we uncover several key insights: federated algorithms consistently outperform local training; the distribution of specific apps plays a crucial role in heterogeneity; and, even apps from distinct categories can exhibit correlations during training. FedMABench is publicly available at: https://github.com/wwh0411/FedMABench with the datasets at: https://huggingface.co/datasets/wwh0411/FedMABench.
Problem

Research questions and friction points this paper is trying to address.

Standardized benchmarks for federated mobile agent training.
Evaluation of mobile agents in heterogeneous data environments.
Performance comparison of federated algorithms versus local training.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces FedMABench for federated mobile agent training
Features 6 datasets, 8 algorithms, 10+ models
Publicly available benchmark with comprehensive evaluation
🔎 Similar Papers
No similar papers found.
W
Wenhao Wang
Zhejiang University, Shanghai AI Laboratory, Hangzhou, China
Zijie Yu
Zijie Yu
Shanghai Jiao Tong University
agent llm
R
Rui Ye
Shanghai Jiao Tong University, Shanghai, China
J
Jianqing Zhang
Shanghai Jiao Tong University, Shanghai, China
Siheng Chen
Siheng Chen
Shanghai Jiao Tong University
Collective intelligenceLLM agentgraph signal processingcollaborative perception
Yanfeng Wang
Yanfeng Wang
Shanghai Jiao Tong University