MagicGUI-RMS: A Multi-Agent Reward Model System for Self-Evolving GUI Agents via Automated Feedback Reflux

📅 2026-01-19

📈 Citations: 1

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the lack of efficient, scalable automated evaluation and continual learning mechanisms for GUI agents by proposing a multi-agent reward framework that integrates a domain-specific reward model (DS-RM) with a general-purpose reward model (GP-RM). The approach enables fine-grained behavioral scoring, error correction, and self-evolutionary learning through collaborative assessment, coupled with automatic construction of structured reward data and a feedback reflux mechanism that eliminates the need for manual annotation. Experimental results demonstrate that the framework significantly improves task accuracy and behavioral robustness, establishing an efficient and scalable reward-driven paradigm for self-evolving GUI agents.

Technology Category

Application Category

📝 Abstract

Graphical user interface (GUI) agents are rapidly progressing toward autonomous interaction and reliable task execution across diverse applications. However, two central challenges remain unresolved: automating the evaluation of agent trajectories and generating high-quality training data at scale to enable continual improvement. Existing approaches often depend on manual annotation or static rule-based verification, which restricts scalability and limits adaptability in dynamic environments. We present MagicGUI-RMS, a multi-agent reward model system that delivers adaptive trajectory evaluation, corrective feedback, and self-evolving learning capabilities. MagicGUI-RMS integrates a Domain-Specific Reward Model (DS-RM) with a General-Purpose Reward Model (GP-RM), enabling fine-grained action assessment and robust generalization across heterogeneous GUI tasks. To support reward learning at scale, we design a structured data construction pipeline that automatically produces balanced and diverse reward datasets, effectively reducing annotation costs while maintaining sample fidelity. During execution, the reward model system identifies erroneous actions, proposes refined alternatives, and continuously enhances agent behavior through an automated data-reflux mechanism. Extensive experiments demonstrate that MagicGUI-RMS yields substantial gains in task accuracy, behavioral robustness. These results establish MagicGUI-RMS as a principled and effective foundation for building self-improving GUI agents driven by reward-based adaptation.

Problem

Research questions and friction points this paper is trying to address.

GUI agents

reward model

automated evaluation

training data generation

self-evolving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Reward Model

Self-Evolving GUI Agents

Automated Feedback Reflux