ProgRM: Build Better GUI Agents with Progress Rewards

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of fine-grained reward signals and high annotation cost for trajectory labeling in training LLM-based GUI agents, this paper proposes the Progress Reward Model (ProgRM). Methodologically, ProgRM introduces: (1) a dense progress feedback mechanism grounded in task completion degree, overcoming the binary outcome-only limitation of conventional Outcome Reward Models (ORMs); and (2) an LCS-based self-labeling algorithm that automatically identifies critical steps and generates progress labels via longest common subsequence alignment—eliminating manual annotation. The framework integrates online reinforcement learning, progress prediction modeling, and self-supervised reward annotation. Evaluated across diverse GUI benchmarks, ProgRM significantly improves task success rates, outperforming leading closed-source LLMs and ORM baselines, and achieves state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
LLM-based (Large Language Model) GUI (Graphical User Interface) agents can potentially reshape our daily lives significantly. However, current LLM-based GUI agents suffer from the scarcity of high-quality training data owing to the difficulties of trajectory collection and reward annotation. Existing works have been exploring LLMs to collect trajectories for imitation learning or to offer reward signals for online RL training. However, the Outcome Reward Model (ORM) used in existing works cannot provide finegrained feedback and can over-penalize the valuable steps in finally failed trajectories. To this end, we propose Progress Reward Model (ProgRM) to provide dense informative intermediate rewards by predicting a task completion progress for each step in online training. To handle the challenge of progress reward label annotation, we further design an efficient LCS-based (Longest Common Subsequence) self-annotation algorithm to discover the key steps in trajectories and assign progress labels accordingly. ProgRM is evaluated with extensive experiments and analyses. Actors trained with ProgRM outperform leading proprietary LLMs and ORM-trained actors, illustrating the effectiveness of ProgRM. The codes for experiments will be made publicly available upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

LLM-based GUI agents lack high-quality training data
Existing reward models provide coarse feedback and penalize valuable steps
ProgRM offers dense progress rewards for better online training
Innovation

Methods, ideas, or system contributions that make the work stand out.

ProgRM provides dense intermediate progress rewards
LCS-based self-annotation for key step discovery
Outperforms ORM-trained agents and proprietary LLMs
🔎 Similar Papers
No similar papers found.
D
Danyang Zhang
X-LANCE Lab, School of Computer Science, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai, China; Jiangsu Key Lab of Language Computing, Suzhou, China
Situo Zhang
Situo Zhang
Shanghai Jiao Tong University
Large Language ModelsReinforcement Learning
Ziyue Yang
Ziyue Yang
PhD of Chemical Engineering, University of Rochester
BiomoleculesMachine learning
Zichen Zhu
Zichen Zhu
Shanghai Jiao Tong University
GUI智能体,多模态大模型,人机交互
Zihan Zhao
Zihan Zhao
Shanghai Jiao Tong University
NLP
Ruisheng Cao
Ruisheng Cao
Shanghai Jiao Tong University
LLM Agenttext-to-SQLcode generationsemantic parsingdialogue systems
L
Lu Chen
X-LANCE Lab, School of Computer Science, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai, China; Jiangsu Key Lab of Language Computing, Suzhou, China; Suzhou Laboratory, Suzhou, China
K
Kai Yu
X-LANCE Lab, School of Computer Science, MoE Key Lab of Artificial Intelligence, SJTU AI Institute, Shanghai Jiao Tong University, Shanghai, China; Jiangsu Key Lab of Language Computing, Suzhou, China; Suzhou Laboratory, Suzhou, China