UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the challenge of sparse rewards and ambiguous credit assignment in long-horizon mobile GUI agent tasks, which hinders effective learning from failed trajectories. The authors propose a two-stage self-evolution framework: in the first stage, rejection fine-tuning (RFT) enables co-evolution of data and model through autonomous refinement; in the second stage, group-relative self-distillation (GRSD) extracts dense supervision signals from unlabeled successful trajectories to correct erroneous paths. This approach achieves the first fully autonomous, annotation-free continuous evolution of GUI agents. Evaluated on the AndroidWorld benchmark, a 4B-parameter model attains an 81.0% Pass@1 success rate, surpassing existing baselines and even human performance.

Technology Category

Application Category

📝 Abstract

Autonomous mobile GUI agents have attracted increasing attention along with the advancement of Multimodal Large Language Models (MLLMs). However, existing methods still suffer from inefficient learning from failed trajectories and ambiguous credit assignment under sparse rewards for long-horizon GUI tasks. To that end, we propose UI-Voyager, a novel two-stage self-evolving mobile GUI agent. In the first stage, we employ Rejection Fine-Tuning (RFT), which enables the continuous co-evolution of data and models in a fully autonomous loop. The second stage introduces Group Relative Self-Distillation (GRSD), which identifies critical fork points in group rollouts and constructs dense step-level supervision from successful trajectories to correct failed ones. Extensive experiments on AndroidWorld show that our 4B model achieves an 81.0% Pass@1 success rate, outperforming numerous recent baselines and exceeding human-level performance. Ablation and case studies further verify the effectiveness of GRSD. Our method represents a significant leap toward efficient, self-evolving, and high-performance mobile GUI automation without expensive manual data annotation.

Problem

Research questions and friction points this paper is trying to address.

failed trajectories

credit assignment

sparse rewards

long-horizon GUI tasks

GUI automation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rejection Fine-Tuning

Group Relative Self-Distillation

Self-Evolving Agent