MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluation frameworks for mobile GUI agents lack systematic assessment of their ability to coordinate GUI interactions with lightweight automation mechanisms—such as APIs, deep links, and RPA scripts. Method: We introduce the first unified benchmark for mobile GUI–shortcut-integrated agents, comprising 139 real-world, complex tasks and 88 predefined shortcuts. It systematically evaluates agents’ autonomous capabilities in discovering, generating, and reusing low-cost shortcuts—departing from conventional paradigms reliant on fixed, pre-specified instructions. We design seven comprehensive metrics quantifying task success rate, execution efficiency, and shortcut generation quality. Results: Empirical evaluation demonstrates that shortcut-augmented agents significantly outperform pure GUI-based agents across all dimensions. This validates the benchmark’s effectiveness and necessity in advancing efficient, autonomous mobile intelligence.

Technology Category

Application Category

📝 Abstract
To enhance the efficiency of GUI agents on various platforms like smartphones and computers, a hybrid paradigm that combines flexible GUI operations with efficient shortcuts (e.g., API, deep links) is emerging as a promising direction. However, a framework for systematically benchmarking these hybrid agents is still underexplored. To take the first step in bridging this gap, we introduce MAS-Bench, a benchmark that pioneers the evaluation of GUI-shortcut hybrid agents with a specific focus on the mobile domain. Beyond merely using predefined shortcuts, MAS-Bench assesses an agent's capability to autonomously generate shortcuts by discovering and creating reusable, low-cost workflows. It features 139 complex tasks across 11 real-world applications, a knowledge base of 88 predefined shortcuts (APIs, deep-links, RPA scripts), and 7 evaluation metrics. The tasks are designed to be solvable via GUI-only operations, but can be significantly accelerated by intelligently embedding shortcuts. Experiments show that hybrid agents achieve significantly higher success rates and efficiency than their GUI-only counterparts. This result also demonstrates the effectiveness of our method for evaluating an agent's shortcut generation capabilities. MAS-Bench fills a critical evaluation gap, providing a foundational platform for future advancements in creating more efficient and robust intelligent agents.
Problem

Research questions and friction points this paper is trying to address.

Lack of benchmark for GUI-shortcut hybrid agents evaluation
Need to assess autonomous shortcut generation capabilities
Require unified testing across multiple mobile applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid GUI-shortcut benchmark for mobile agents
Autonomous shortcut generation and workflow creation
Comprehensive evaluation with multiple metrics and tasks
🔎 Similar Papers
No similar papers found.
Pengxiang Zhao
Pengxiang Zhao
Zhejiang university
LLMAI Agent
G
Guangyi Liu
Zhejiang University
Y
Yaozhen Liang
Zhejiang University
Weiqing He
Weiqing He
University of Pennsylvania
Zhengxi Lu
Zhengxi Lu
Zhejiang University
MLLMAgent
Y
Yuehao Huang
Zhejiang University
Yaxuan Guo
Yaxuan Guo
vivo AI Lab
UI AgentMobile Agent
Kexin Zhang
Kexin Zhang
Tsinghua University
Data MiningMachine Learning
H
Hao Wang
vivo AI Lab
L
Liang Liu
vivo AI Lab
Y
Yong Liu
Zhejiang University