MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Existing evaluation frameworks for mobile GUI agents lack systematic assessment of their ability to coordinate GUI interactions with lightweight automation mechanisms—such as APIs, deep links, and RPA scripts. Method: We introduce the first unified benchmark for mobile GUI–shortcut-integrated agents, comprising 139 real-world, complex tasks and 88 predefined shortcuts. It systematically evaluates agents’ autonomous capabilities in discovering, generating, and reusing low-cost shortcuts—departing from conventional paradigms reliant on fixed, pre-specified instructions. We design seven comprehensive metrics quantifying task success rate, execution efficiency, and shortcut generation quality. Results: Empirical evaluation demonstrates that shortcut-augmented agents significantly outperform pure GUI-based agents across all dimensions. This validates the benchmark’s effectiveness and necessity in advancing efficient, autonomous mobile intelligence.

Technology Category

Application Category

📝 Abstract

To enhance the efficiency of GUI agents on various platforms like smartphones and computers, a hybrid paradigm that combines flexible GUI operations with efficient shortcuts (e.g., API, deep links) is emerging as a promising direction. However, a framework for systematically benchmarking these hybrid agents is still underexplored. To take the first step in bridging this gap, we introduce MAS-Bench, a benchmark that pioneers the evaluation of GUI-shortcut hybrid agents with a specific focus on the mobile domain. Beyond merely using predefined shortcuts, MAS-Bench assesses an agent's capability to autonomously generate shortcuts by discovering and creating reusable, low-cost workflows. It features 139 complex tasks across 11 real-world applications, a knowledge base of 88 predefined shortcuts (APIs, deep-links, RPA scripts), and 7 evaluation metrics. The tasks are designed to be solvable via GUI-only operations, but can be significantly accelerated by intelligently embedding shortcuts. Experiments show that hybrid agents achieve significantly higher success rates and efficiency than their GUI-only counterparts. This result also demonstrates the effectiveness of our method for evaluating an agent's shortcut generation capabilities. MAS-Bench fills a critical evaluation gap, providing a foundational platform for future advancements in creating more efficient and robust intelligent agents.

Problem

Research questions and friction points this paper is trying to address.

Lack of benchmark for GUI-shortcut hybrid agents evaluation

Need to assess autonomous shortcut generation capabilities

Require unified testing across multiple mobile applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid GUI-shortcut benchmark for mobile agents

Autonomous shortcut generation and workflow creation

Comprehensive evaluation with multiple metrics and tasks

🔎 Similar Papers

Benchmarking Mobile Device Control Agents across Diverse Configurations