Mobile-Agent-RAG: Driving Smart Multi-Agent Coordination with Contextual Knowledge Empowerment for Long-Horizon Mobile Automation

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current mobile agents exhibit low success rates on long-horizon, cross-application tasks, primarily due to overreliance on static large language model (LLM) knowledge—causing high-level planning hallucinations and low-level UI interaction errors. To address this, we propose a two-tier retrieval-augmented multi-agent framework that decouples high-level strategic planning from low-level interface execution. Specifically, we introduce Manager-RAG—a task-level strategy repository—and Operator-RAG—an atomic UI operation knowledge base—both integrated with semantic retrieval and context-aware coordination. All knowledge entries are manually verified to substantially mitigate hallucinations and improve action precision. Evaluated on our newly constructed benchmark Mobile-Eval-RAG, our approach achieves a 11.0% absolute gain in task completion rate and a 10.2% improvement in step efficiency, significantly outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Mobile agents show immense potential, yet current state-of-the-art (SoTA) agents exhibit inadequate success rates on real-world, long-horizon, cross-application tasks. We attribute this bottleneck to the agents' excessive reliance on static, internal knowledge within MLLMs, which leads to two critical failure points: 1) strategic hallucinations in high-level planning and 2) operational errors during low-level execution on user interfaces (UI). The core insight of this paper is that high-level planning and low-level UI operations require fundamentally distinct types of knowledge. Planning demands high-level, strategy-oriented experiences, whereas operations necessitate low-level, precise instructions closely tied to specific app UIs. Motivated by these insights, we propose Mobile-Agent-RAG, a novel hierarchical multi-agent framework that innovatively integrates dual-level retrieval augmentation. At the planning stage, we introduce Manager-RAG to reduce strategic hallucinations by retrieving human-validated comprehensive task plans that provide high-level guidance. At the execution stage, we develop Operator-RAG to improve execution accuracy by retrieving the most precise low-level guidance for accurate atomic actions, aligned with the current app and subtask. To accurately deliver these knowledge types, we construct two specialized retrieval-oriented knowledge bases. Furthermore, we introduce Mobile-Eval-RAG, a challenging benchmark for evaluating such agents on realistic multi-app, long-horizon tasks. Extensive experiments demonstrate that Mobile-Agent-RAG significantly outperforms SoTA baselines, improving task completion rate by 11.0% and step efficiency by 10.2%, establishing a robust paradigm for context-aware, reliable multi-agent mobile automation.
Problem

Research questions and friction points this paper is trying to address.

Addresses mobile agents' low success rates in long-horizon cross-application tasks
Reduces strategic hallucinations in planning and operational UI execution errors
Enhances multi-agent coordination with contextual knowledge for mobile automation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical multi-agent framework with dual-level retrieval augmentation
Manager-RAG retrieves human-validated task plans for strategic guidance
Operator-RAG retrieves precise UI instructions for accurate execution
🔎 Similar Papers
No similar papers found.
Yuxiang Zhou
Yuxiang Zhou
Postdoctoral Researcher, Queen Mary University of London
Natural Language ProcessingLarge Language Model
Jichang Li
Jichang Li
Assistant Researcher@Pengcheng Lab
Agentic VisionEmbodied AIVisual Content UnderstandingWeakly-supervised Learning
Y
Yanhao Zhang
OPPO AI Center, OPPO Inc., China
H
Haonan Lu
OPPO AI Center, OPPO Inc., China
G
Guanbin Li
School of Computer Science and Engineering, Sun Yat-sen University