🤖 AI Summary
Existing research on mobile agents is hindered by closed datasets and opaque trajectory synthesis methods, limiting reproducibility and extensibility. This work proposes OpenMobile, an open-source framework that introduces high-quality training data for mobile agents alongside a novel task generation mechanism grounded in environmental memory and a trajectory rollback strategy that switches between learner and expert policies to synthesize diverse, embodied instructions and critical error-recovery trajectories. By integrating vision-language models, environmental exploration memory, and imitation learning, the framework substantially enhances generalization. Models fine-tuned within this framework—Qwen2.5-VL and Qwen3-VL—achieve success rates of 51.7% and 64.7%, respectively, on AndroidWorld, significantly outperforming existing open-source approaches without evidence of test-set overfitting.
📝 Abstract
Mobile agents powered by vision-language models have demonstrated impressive capabilities in automating mobile tasks, with recent leading models achieving a marked performance leap, e.g., nearly 70% success on AndroidWorld. However, these systems keep their training data closed and remain opaque about their task and trajectory synthesis recipes. We present OpenMobile, an open-source framework that synthesizes high-quality task instructions and agent trajectories, with two key components: (1) The first is a scalable task synthesis pipeline that constructs a global environment memory from exploration, then leverages it to generate diverse and grounded instructions. and (2) a policy-switching strategy for trajectory rollout. By alternating between learner and expert models, it captures essential error-recovery data often missing in standard imitation learning. Agents trained on our data achieve competitive results across three dynamic mobile agent benchmarks: notably, our fine-tuned Qwen2.5-VL and Qwen3-VL reach 51.7% and 64.7% on AndroidWorld, far surpassing existing open-data approaches. Furthermore, we conduct transparent analyses on the overlap between our synthetic instructions and benchmark test sets, and verify that performance gains stem from broad functionality coverage rather than benchmark overfitting. We release data and code at https://njucckevin.github.io/openmobile/ to bridge the data gap and facilitate broader mobile agent research.