🤖 AI Summary
This work addresses the inefficiency of online reinforcement learning in mobile GUI agents, which often stems from difficulties in credit assignment and the absence of mechanisms for transferring learned experience, leading to poor performance on long-horizon tasks and repeated errors across tasks. To overcome these limitations, the authors propose UI-Mem, a novel framework featuring a self-evolving experience memory mechanism. UI-Mem employs a hierarchical memory bank to store structured knowledge—including workflows, subtask skills, and failure patterns—and leverages parameterized templates to enable transfer across both tasks and applications. Coupled with a hierarchical group sampling strategy, this approach maintains exploration diversity while facilitating policy internalization. Experimental results demonstrate that UI-Mem significantly outperforms existing reinforcement learning baselines and static reuse methods on multiple online GUI benchmarks, exhibiting superior generalization capabilities.
📝 Abstract
Online Reinforcement Learning (RL) offers a promising paradigm for enhancing GUI agents through direct environment interaction. However, its effectiveness is severely hindered by inefficient credit assignment in long-horizon tasks and repetitive errors across tasks due to the lack of experience transfer. To address these challenges, we propose UI-Mem, a novel framework that enhances GUI online RL with a Hierarchical Experience Memory. Unlike traditional replay buffers, our memory accumulates structured knowledge, including high-level workflows, subtask skills, and failure patterns. These experiences are stored as parameterized templates that enable cross-task and cross-application transfer. To effectively integrate memory guidance into online RL, we introduce Stratified Group Sampling, which injects varying levels of guidance across trajectories within each rollout group to maintain outcome diversity, driving the unguided policy toward internalizing guided behaviors. Furthermore, a Self-Evolving Loop continuously abstracts novel strategies and errors to keep the memory aligned with the agent's evolving policy. Experiments on online GUI benchmarks demonstrate that UI-Mem significantly outperforms traditional RL baselines and static reuse strategies, with strong generalization to unseen applications. Project page: https://ui-mem.github.io