UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

GUI agents face two core challenges: difficulty in trajectory verification and the scarcity of scalable, high-quality training data. To address these, we propose UI-Genie, a self-evolving framework. It introduces UI-Genie-RM—the first GUI-specialized interleaved image-text reward model—enabling unified action-level and task-level reward modeling. We design rule-based validation, controllable trajectory corruption, and hard negative mining to construct the first reward-guided synthetic dataset (517k reward annotations + 16k trajectories). Furthermore, we establish a reward-guided exploration–verification closed loop and a dynamic-environment-aware iterative self-distillation mechanism. Through three generations of joint data–model co-evolution, UI-Genie achieves state-of-the-art performance across multiple GUI benchmarks, with substantial gains in complex task success rates. All code and datasets are publicly released.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce UI-Genie, a self-improving framework addressing two key challenges in GUI agents: verification of trajectory outcome is challenging and high-quality training data are not scalable. These challenges are addressed by a reward model and a self-improving pipeline, respectively. The reward model, UI-Genie-RM, features an image-text interleaved architecture that efficiently pro- cesses historical context and unifies action-level and task-level rewards. To sup- port the training of UI-Genie-RM, we develop deliberately-designed data genera- tion strategies including rule-based verification, controlled trajectory corruption, and hard negative mining. To address the second challenge, a self-improvement pipeline progressively expands solvable complex GUI tasks by enhancing both the agent and reward models through reward-guided exploration and outcome verification in dynamic environments. For training the model, we generate UI- Genie-RM-517k and UI-Genie-Agent-16k, establishing the first reward-specific dataset for GUI agents while demonstrating high-quality synthetic trajectory gen- eration without manual annotation. Experimental results show that UI-Genie achieves state-of-the-art performance across multiple GUI agent benchmarks with three generations of data-model self-improvement. We open-source our complete framework implementation and generated datasets to facilitate further research in https://github.com/Euphoria16/UI-Genie.

Problem

Research questions and friction points this paper is trying to address.

Addresses GUI agent trajectory outcome verification challenges

Solves scalable high-quality training data generation issues

Enhances agent performance via self-improving pipeline and reward model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-improving framework with reward model

Image-text interleaved reward architecture

Reward-guided exploration in dynamic environments

🔎 Similar Papers

AppAgent v2: Advanced Agent for Flexible Mobile Interactions