Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In GUI automation, low-fault-tolerant stepwise decision-making often triggers cascading failures (e.g., accidental deletion or payment). To address this, we propose a pre-execution critique mechanism that predicts and evaluates operational consequences before action execution, enabling proactive risk mitigation. Methodologically, we introduce the novel “pre-action critique” paradigm; design Suggestion-aware Gradient Relative Policy Optimization (S-GRPO) for training robust critique policies; and construct GUI-Critic-Train/Test—the first GUI-specific critique dataset—by integrating multimodal large language models (MLLMs), GUI-state-aware modeling, and reasoning-guided synthetic data generation. Experiments demonstrate significant improvements in critique accuracy on the cross-platform GUI-Critic-Test benchmark, and substantial gains in task success rate and operational efficiency over state-of-the-art MLLM-based baselines in dynamic GUI environments.

Technology Category

Application Category

📝 Abstract
In recent years, Multimodal Large Language Models (MLLMs) have been extensively utilized for multimodal reasoning tasks, including Graphical User Interface (GUI) automation. Unlike general offline multimodal tasks, GUI automation is executed in online interactive environments, necessitating step-by-step decision-making based on real-time status of the environment. This task has a lower tolerance for decision-making errors at each step, as any mistakes may cumulatively disrupt the process and potentially lead to irreversible outcomes like deletions or payments. To address these issues, we introduce a pre-operative critic mechanism that provides effective feedback prior to the actual execution, by reasoning about the potential outcome and correctness of actions. Specifically, we propose a Suggestion-aware Gradient Relative Policy Optimization (S-GRPO) strategy to construct our pre-operative critic model GUI-Critic-R1, incorporating a novel suggestion reward to enhance the reliability of the model's feedback. Furthermore, we develop a reasoning-bootstrapping based data collection pipeline to create a GUI-Critic-Train and a GUI-Critic-Test, filling existing gaps in GUI critic data. Static experiments on the GUI-Critic-Test across both mobile and web domains reveal that our GUI-Critic-R1 offers significant advantages in critic accuracy compared to current MLLMs. Dynamic evaluation on GUI automation benchmark further highlights the effectiveness and superiority of our model, as evidenced by improved success rates and operational efficiency.
Problem

Research questions and friction points this paper is trying to address.

Diagnosing pre-operative errors in GUI automation to prevent irreversible outcomes
Enhancing feedback reliability with S-GRPO strategy for GUI-Critic-R1 model
Addressing data gaps in GUI critic evaluation through reasoning-bootstrapping pipeline
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-operative critic mechanism for error prevention
S-GRPO strategy enhances feedback reliability
Reasoning-bootstrapping data collection pipeline
🔎 Similar Papers
No similar papers found.
Y
Yuyang Wanyan
MAIS, Institute of Automation, Chinese Academy of Sciences, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, China
X
Xi Zhang
Alibaba Group
H
Haiyang Xu
Alibaba Group
Haowei Liu
Haowei Liu
TongYi Lab, Alibaba Group
Multimodal Learning
J
Junyang Wang
Beijing Jiaotong University
Jiabo Ye
Jiabo Ye
Alibaba Inc. Tongyi Lab, mPLUG Team
Vision-LanguageGUI Agent
Y
Yutong Kou
MAIS, Institute of Automation, Chinese Academy of Sciences, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, China
M
Ming Yan
Alibaba Group
F
Fei Huang
Alibaba Group
X
Xiaoshan Yang
MAIS, Institute of Automation, Chinese Academy of Sciences, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, China
W
Weiming Dong
MAIS, Institute of Automation, Chinese Academy of Sciences, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, China
Changsheng Xu
Changsheng Xu
Professor, Institute of Automation, Chinese Academy of Sciences
MultimediaComputer vision