MagicGUI-RMS: A Multi-Agent Reward Model System for Self-Evolving GUI Agents via Automated Feedback Reflux

📅 2026-01-19
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of efficient, scalable automated evaluation and continual learning mechanisms for GUI agents by proposing a multi-agent reward framework that integrates a domain-specific reward model (DS-RM) with a general-purpose reward model (GP-RM). The approach enables fine-grained behavioral scoring, error correction, and self-evolutionary learning through collaborative assessment, coupled with automatic construction of structured reward data and a feedback reflux mechanism that eliminates the need for manual annotation. Experimental results demonstrate that the framework significantly improves task accuracy and behavioral robustness, establishing an efficient and scalable reward-driven paradigm for self-evolving GUI agents.

Technology Category

Application Category

📝 Abstract
Graphical user interface (GUI) agents are rapidly progressing toward autonomous interaction and reliable task execution across diverse applications. However, two central challenges remain unresolved: automating the evaluation of agent trajectories and generating high-quality training data at scale to enable continual improvement. Existing approaches often depend on manual annotation or static rule-based verification, which restricts scalability and limits adaptability in dynamic environments. We present MagicGUI-RMS, a multi-agent reward model system that delivers adaptive trajectory evaluation, corrective feedback, and self-evolving learning capabilities. MagicGUI-RMS integrates a Domain-Specific Reward Model (DS-RM) with a General-Purpose Reward Model (GP-RM), enabling fine-grained action assessment and robust generalization across heterogeneous GUI tasks. To support reward learning at scale, we design a structured data construction pipeline that automatically produces balanced and diverse reward datasets, effectively reducing annotation costs while maintaining sample fidelity. During execution, the reward model system identifies erroneous actions, proposes refined alternatives, and continuously enhances agent behavior through an automated data-reflux mechanism. Extensive experiments demonstrate that MagicGUI-RMS yields substantial gains in task accuracy, behavioral robustness. These results establish MagicGUI-RMS as a principled and effective foundation for building self-improving GUI agents driven by reward-based adaptation.
Problem

Research questions and friction points this paper is trying to address.

GUI agents
reward model
automated evaluation
training data generation
self-evolving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Reward Model
Self-Evolving GUI Agents
Automated Feedback Reflux
Reward Learning
Trajectory Evaluation
🔎 Similar Papers
Z
Zecheng Li
Honor Device Co., Ltd
Z
Zhihui Cao
Honor Device Co., Ltd
Wenke Huang
Wenke Huang
School of Computer Science, Wuhan University
Federated LearningMLLM
Yudong Zhang
Yudong Zhang
University of Leicester, HFWLA/FIET/FEAI/FBCS/SMIEEE/SMACM/DSACM, Clarivate Highly Cited Researcher
artificial intelligencedeep learningmedical image processing
K
Keying Qi
Honor Device Co., Ltd
R
Rui Wang
Honor Device Co., Ltd
Zeyu Zheng
Zeyu Zheng
DeepMind
artificial intelligencemachine learningreinforcement learningdeep learning
J
Jian Zhao
Honor Device Co., Ltd
H
Hao Zhu
Honor Device Co., Ltd
H
Hengxin Wu
Honor Device Co., Ltd
Y
Yuran Wang
Honor Device Co., Ltd
G
Guitao Fan
Honor Device Co., Ltd
G
Guokun Wu
Honor Device Co., Ltd
Y
Yicong Liu
Honor Device Co., Ltd
Z
Zhilin Gao
Honor Device Co., Ltd
H
Haikun Xu
Honor Device Co., Ltd
He Yang
He Yang
Xi'an Jiaotong University
Federated LearningDeep LearningPrivacy & Security
M
Minqi Xiang
Honor Device Co., Ltd
X
Xingyu Liu
Honor Device Co., Ltd
Z
Zuojiang Wang
Honor Device Co., Ltd