MobiAgent: A Systematic Framework for Customizable Mobile Agents

📅 2025-08-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
GUI-driven mobile agents face core challenges in real-world deployment, including low task execution accuracy, inefficient reasoning, and scarcity of high-quality annotated data. To address these issues, this paper proposes MobiMind—a holistic system framework comprising: (1) a family of mobile-optimized vision-language models; (2) AgentRR, a reinforcement learning–based, GUI-structure-aware reasoning acceleration framework; (3) an automated data collection pipeline supporting both self-labeling and synthetic data generation; and (4) MobiFlow, a lightweight yet multi-task benchmark suite. Extensive experiments demonstrate that MobiMind significantly outperforms general-purpose large language models and state-of-the-art GUI agents on real-device tasks, achieving new SOTA accuracy and inference speed. Moreover, it reduces human annotation cost by 67%.

Technology Category

Application Category

📝 Abstract
With the rapid advancement of Vision-Language Models (VLMs), GUI-based mobile agents have emerged as a key development direction for intelligent mobile systems. However, existing agent models continue to face significant challenges in real-world task execution, particularly in terms of accuracy and efficiency. To address these limitations, we propose MobiAgent, a comprehensive mobile agent system comprising three core components: the MobiMind-series agent models, the AgentRR acceleration framework, and the MobiFlow benchmarking suite. Furthermore, recognizing that the capabilities of current mobile agents are still limited by the availability of high-quality data, we have developed an AI-assisted agile data collection pipeline that significantly reduces the cost of manual annotation. Compared to both general-purpose LLMs and specialized GUI agent models, MobiAgent achieves state-of-the-art performance in real-world mobile scenarios.
Problem

Research questions and friction points this paper is trying to address.

Improving mobile agent accuracy and efficiency in real-world tasks
Addressing limitations of existing GUI-based mobile agent models
Overcoming high-quality data scarcity for mobile agent training
Innovation

Methods, ideas, or system contributions that make the work stand out.

MobiMind agent models for mobile intelligence
AgentRR framework accelerates task execution
AI-assisted data pipeline reduces annotation cost
C
Cheng Zhang
Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University
Erhu Feng
Erhu Feng
SHANG HAI JIAO TONG UNIVERSITY
MLSYSOperating SystemArchitecture
X
Xi Zhao
Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University
Y
Yisheng Zhao
Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University
W
Wangbo Gong
Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University
Jiahui Sun
Jiahui Sun
Shanghai Jiao Tong University
System
Dong Du
Dong Du
Associate Professor, Nanjing University of Science and Technology
Computer Graphics3D Computer Vision
Zhichao Hua
Zhichao Hua
Associate Professor, Shanghai Jiao Tong University
operating systemsarchitectureshardware/software co-design and the systems/architectures for LLM.
Yubin Xia
Yubin Xia
Professor, Shanghai Jiao Tong University
Operation SystemVirtualizationComputer ArchitectureSystem Security
H
Haibo Chen
Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University