CGL: Advancing Continual GUI Learning via Reinforcement Fine-Tuning

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the challenge of catastrophic forgetting in GUI agents under frequent application updates, which hinders effective continual learning. To mitigate this issue, the authors propose the CGL framework, which dynamically integrates supervised fine-tuning (SFT) with GRPO-based reinforcement learning. CGL introduces an entropy-driven adaptive mechanism to adjust the SFT ratio and employs a gradient surgery strategy tailored for GRPO to alleviate both knowledge overwriting and gradient conflicts. Evaluated on the newly constructed AndroidControl-CL benchmark, CGL consistently outperforms existing methods across diverse continual learning scenarios, effectively balancing rapid adaptation to new tasks with robust retention of previously acquired skills.

Technology Category

Application Category

📝 Abstract

Graphical User Interface (GUI) Agents, benefiting from recent advances in multimodal large language models (MLLM), have achieved significant development. However, due to the frequent updates of GUI applications, adapting to new tasks without forgetting old tasks in GUI continual learning remains an open problem. In this work, we reveal that while Supervised Fine-Tuning (SFT) facilitates fast adaptation, it often triggers knowledge overwriting, whereas Reinforcement Learning (RL) demonstrates an inherent resilience that shields prior interaction logic from erasure. Based on this insight, we propose a \textbf{C}ontinual \textbf{G}UI \textbf{L}earning (CGL) framework that dynamically balances adaptation efficiency and skill retention by enhancing the synergy between SFT and RL. Specifically, we introduce an SFT proportion adjustment mechanism guided by policy entropy to dynamically control the weight allocation between the SFT and RL training phases. To resolve explicit gradient interference, we further develop a specialized gradient surgery strategy. By projecting exploratory SFT gradients onto GRPO-based anchor gradients, our method explicitly clips the components of SFT gradients that conflict with GRPO. On top of that, we establish an AndroidControl-CL benchmark, which divides GUI applications into distinct task groups to effectively simulate and evaluate the performance of continual GUI learning. Experimental results demonstrate the effectiveness of our proposed CGL framework across continual learning scenarios. The benchmark, code, and model will be made publicly available.

Problem

Research questions and friction points this paper is trying to address.

Continual Learning

Graphical User Interface

Catastrophic Forgetting

GUI Agents

Task Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual Learning

Reinforcement Learning

Supervised Fine-Tuning