AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation

📅 2024-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Mobile UI automation faces challenges in efficiently translating natural language instructions into executable GUI scripts while preserving user privacy and minimizing energy consumption. Method: This paper proposes a privacy-preserving, low-power on-device Small Language Model (SLM) framework for mobile UI automation. It introduces a novel documentation-centric synthetic data generation paradigm: automatically constructing fine-grained App API documentation and leveraging it to synthesize diverse, high-quality training data. The framework integrates instruction tuning, context-guided code generation, and a lightweight on-device code interpreter to enable SLM-driven local code synthesis. Contribution/Results: Extensive experiments demonstrate significant improvements in task success rate, reduced inference latency, and lower token consumption across multiple mobile UI automation tasks. The approach outperforms existing state-of-the-art UI agents and supports fully offline, localized deployment—ensuring data privacy and energy efficiency without cloud dependency.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have brought exciting new advances to mobile UI agents, a long-standing research field that aims to complete arbitrary natural language tasks through mobile UI interactions. However, existing UI agents usually demand high reasoning capabilities of powerful large models that are difficult to be deployed locally on end-users' devices, which raises huge concerns about user privacy and centralized serving cost. One way to reduce the required model size is to customize a smaller domain-specific model with high-quality training data, e.g. large-scale human demonstrations of diverse types of apps and tasks, while such datasets are extremely difficult to obtain. Inspired by the remarkable coding abilities of recent small language models (SLMs), we propose to convert the UI task automation problem to a code generation problem, which can be effectively solved by an on-device SLM and efficiently executed with an on-device code interpreter. Unlike normal coding tasks that can be extensively pretrained with public datasets, generating UI automation code is challenging due to the diversity, complexity, and variability of target apps. Therefore, we adopt a document-centered approach that automatically builds fine-grained API documentation for each app and generates diverse task samples based on this documentation. By guiding the agent with the synthetic documents and task samples, it learns to generate precise and efficient scripts to complete unseen tasks. Based on detailed comparisons with state-of-the-art mobile UI agents, our approach effectively improves the mobile task automation with significantly higher success rates and lower latency/token consumption. Code will be open-sourced.
Problem

Research questions and friction points this paper is trying to address.

Privacy Protection
Power Efficiency
Natural Language Processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Small Language Model
Code Generation
Mobile Devices
🔎 Similar Papers
No similar papers found.
H
Hao Wen
Institute for AI Industry Research (AIR), Tsinghua University
Shizuo Tian
Shizuo Tian
Tsinghua University
B
Borislav Pavlov
Institute for AI Industry Research (AIR), Tsinghua University
W
Wenjie Du
Institute for AI Industry Research (AIR), Tsinghua University
Y
Yixuan Li
Institute for AI Industry Research (AIR), Tsinghua University
G
Ge Chang
Institute for AI Industry Research (AIR), Tsinghua University
Shanhui Zhao
Shanhui Zhao
Institute for AI Industry Research (AIR), Tsinghua University
Artificial IntelligenceSoftware TestingLLM-based AgentEdge Computing
J
Jiacheng Liu
Institute for AI Industry Research (AIR), Tsinghua University
Yunxin Liu
Yunxin Liu
IEEE Fellow, Guoqiang Professor, Institute for AI Industry Research (AIR), Tsinghua University
Mobile ComputingEdge ComputingAIoTSystemNetworking
Y
Ya-Qin Zhang
Institute for AI Industry Research (AIR), Tsinghua University
Yuanchun Li
Yuanchun Li
Institute for AI Industry Research (AIR), Tsinghua University
mobile computingartificial intelligence