CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mobile agents rely on large language models (LLMs) to execute smartphone UI tasks, yet purely cloud-based approaches require uploading full UI states—posing significant privacy risks—while purely on-device solutions suffer from limited model capability, resulting in low task success rates. This paper proposes CORE, the first privacy-aware cloud-edge collaborative mobile agent framework. CORE introduces layout-aware XML block partitioning to minimize redundant UI state transmission, a synergistic planning and joint decision-making mechanism, and multi-round cumulative reasoning to mitigate local misjudgments. Extensive experiments across diverse mobile applications demonstrate that CORE reduces exposed UI elements by 55.6% compared to cloud-only baselines, while achieving task success rates nearly matching those of the cloud-only approach. CORE thus significantly advances the privacy–performance trade-off in mobile agent systems.

Technology Category

Application Category

📝 Abstract
Mobile agents rely on Large Language Models (LLMs) to plan and execute tasks on smartphone user interfaces (UIs). While cloud-based LLMs achieve high task accuracy, they require uploading the full UI state at every step, exposing unnecessary and often irrelevant information. In contrast, local LLMs avoid UI uploads but suffer from limited capacity, resulting in lower task success rates. We propose $ extbf{CORE}$, a $ extbf{CO}$llaborative framework that combines the strengths of cloud and local LLMs to $ extbf{R}$educe UI $ extbf{E}$xposure, while maintaining task accuracy for mobile agents. CORE comprises three key components: (1) $ extbf{Layout-aware block partitioning}$, which groups semantically related UI elements based on the XML screen hierarchy; (2) $ extbf{Co-planning}$, where local and cloud LLMs collaboratively identify the current sub-task; and (3) $ extbf{Co-decision-making}$, where the local LLM ranks relevant UI blocks, and the cloud LLM selects specific UI elements within the top-ranked block. CORE further introduces a multi-round accumulation mechanism to mitigate local misjudgment or limited context. Experiments across diverse mobile apps and tasks show that CORE reduces UI exposure by up to 55.6% while maintaining task success rates slightly below cloud-only agents, effectively mitigating unnecessary privacy exposure to the cloud. The code is available at https://github.com/Entropy-Fighter/CORE.
Problem

Research questions and friction points this paper is trying to address.

Reducing UI exposure in mobile agents via cloud-local collaboration
Maintaining task accuracy while minimizing unnecessary information upload
Balancing privacy protection with mobile agent performance requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative framework combining cloud and local LLMs
Layout-aware block partitioning groups related UI elements
Co-planning and co-decision between local and cloud LLMs
🔎 Similar Papers
No similar papers found.
G
Gucongcong Fan
Shanghai Jiao Tong University
Chaoyue Niu
Chaoyue Niu
Shanghai Jiao Tong University
Device-Cloud MLOn-Device Intelligence
C
Chengfei Lyu
Alibaba Group
F
Fan Wu
Shanghai Jiao Tong University
Guihai Chen
Guihai Chen
Professor of Computer Science
Computer Science and Technology