MobiAgent: A Systematic Framework for Customizable Mobile Agents

📅 2025-08-30

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

GUI-driven mobile agents face core challenges in real-world deployment, including low task execution accuracy, inefficient reasoning, and scarcity of high-quality annotated data. To address these issues, this paper proposes MobiMind—a holistic system framework comprising: (1) a family of mobile-optimized vision-language models; (2) AgentRR, a reinforcement learning–based, GUI-structure-aware reasoning acceleration framework; (3) an automated data collection pipeline supporting both self-labeling and synthetic data generation; and (4) MobiFlow, a lightweight yet multi-task benchmark suite. Extensive experiments demonstrate that MobiMind significantly outperforms general-purpose large language models and state-of-the-art GUI agents on real-device tasks, achieving new SOTA accuracy and inference speed. Moreover, it reduces human annotation cost by 67%.

Technology Category

Application Category

📝 Abstract

With the rapid advancement of Vision-Language Models (VLMs), GUI-based mobile agents have emerged as a key development direction for intelligent mobile systems. However, existing agent models continue to face significant challenges in real-world task execution, particularly in terms of accuracy and efficiency. To address these limitations, we propose MobiAgent, a comprehensive mobile agent system comprising three core components: the MobiMind-series agent models, the AgentRR acceleration framework, and the MobiFlow benchmarking suite. Furthermore, recognizing that the capabilities of current mobile agents are still limited by the availability of high-quality data, we have developed an AI-assisted agile data collection pipeline that significantly reduces the cost of manual annotation. Compared to both general-purpose LLMs and specialized GUI agent models, MobiAgent achieves state-of-the-art performance in real-world mobile scenarios.

Problem

Research questions and friction points this paper is trying to address.

Improving mobile agent accuracy and efficiency in real-world tasks

Addressing limitations of existing GUI-based mobile agent models

Overcoming high-quality data scarcity for mobile agent training

Innovation

Methods, ideas, or system contributions that make the work stand out.

MobiMind agent models for mobile intelligence

AgentRR framework accelerates task execution

AI-assisted data pipeline reduces annotation cost

🔎 Similar Papers

Benchmarking Mobile Device Control Agents across Diverse Configurations