🤖 AI Summary
Existing mobile GUI-controlling agents suffer from a lack of large-scale, hierarchically annotated datasets supporting multi-step interaction and semantic understanding. To address this, we propose AMEX—the first high-quality, general-purpose dataset for mobile GUI control—comprising 104K high-resolution Android UI screenshots with three-tier collaborative annotations: (1) GUI element localization, (2) screen-level functional semantic parsing, and (3) alignment of natural language instructions with executable action sequences. This hierarchical annotation paradigm overcomes the limitations of conventional single-layer labeling, substantially improving task interpretability and cross-application generalization. Data are collected from real-world apps and meticulously annotated by human experts, integrating visual grounding, semantic parsing, and instruction–action alignment techniques. Fine-tuning the SPHINX Agent on AMEX demonstrates significant performance gains: our agent achieves markedly higher cross-app task completion rates and instruction-following accuracy compared to baselines. The AMEX dataset is publicly released and has been widely adopted by the research community.
📝 Abstract
AI agents have drawn increasing attention mostly on their ability to perceive environments, understand tasks, and autonomously achieve goals. To advance research on AI agents in mobile scenarios, we introduce the Android Multi-annotation EXpo (AMEX), a comprehensive, large-scale dataset designed for generalist mobile GUI-control agents which are capable of completing tasks by directly interacting with the graphical user interface (GUI) on mobile devices. AMEX comprises over 104K high-resolution screenshots from popular mobile applications, which are annotated at multiple levels. Unlike existing GUI-related datasets, e.g., Rico, AitW, etc., AMEX includes three levels of annotations: GUI interactive element grounding, GUI screen and element functionality descriptions, and complex natural language instructions with stepwise GUI-action chains. We develop this dataset from a more instructive and detailed perspective, complementing the general settings of existing datasets. Additionally, we finetune a baseline model SPHINX Agent and illustrate the effectiveness of AMEX.The project is available at https://yxchai.com/AMEX/.