Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

In GUI automation, low-fault-tolerant stepwise decision-making often triggers cascading failures (e.g., accidental deletion or payment). To address this, we propose a pre-execution critique mechanism that predicts and evaluates operational consequences before action execution, enabling proactive risk mitigation. Methodologically, we introduce the novel “pre-action critique” paradigm; design Suggestion-aware Gradient Relative Policy Optimization (S-GRPO) for training robust critique policies; and construct GUI-Critic-Train/Test—the first GUI-specific critique dataset—by integrating multimodal large language models (MLLMs), GUI-state-aware modeling, and reasoning-guided synthetic data generation. Experiments demonstrate significant improvements in critique accuracy on the cross-platform GUI-Critic-Test benchmark, and substantial gains in task success rate and operational efficiency over state-of-the-art MLLM-based baselines in dynamic GUI environments.

Technology Category

Application Category

📝 Abstract

In recent years, Multimodal Large Language Models (MLLMs) have been extensively utilized for multimodal reasoning tasks, including Graphical User Interface (GUI) automation. Unlike general offline multimodal tasks, GUI automation is executed in online interactive environments, necessitating step-by-step decision-making based on real-time status of the environment. This task has a lower tolerance for decision-making errors at each step, as any mistakes may cumulatively disrupt the process and potentially lead to irreversible outcomes like deletions or payments. To address these issues, we introduce a pre-operative critic mechanism that provides effective feedback prior to the actual execution, by reasoning about the potential outcome and correctness of actions. Specifically, we propose a Suggestion-aware Gradient Relative Policy Optimization (S-GRPO) strategy to construct our pre-operative critic model GUI-Critic-R1, incorporating a novel suggestion reward to enhance the reliability of the model's feedback. Furthermore, we develop a reasoning-bootstrapping based data collection pipeline to create a GUI-Critic-Train and a GUI-Critic-Test, filling existing gaps in GUI critic data. Static experiments on the GUI-Critic-Test across both mobile and web domains reveal that our GUI-Critic-R1 offers significant advantages in critic accuracy compared to current MLLMs. Dynamic evaluation on GUI automation benchmark further highlights the effectiveness and superiority of our model, as evidenced by improved success rates and operational efficiency.

Problem

Research questions and friction points this paper is trying to address.

Diagnosing pre-operative errors in GUI automation to prevent irreversible outcomes

Enhancing feedback reliability with S-GRPO strategy for GUI-Critic-R1 model

Addressing data gaps in GUI critic evaluation through reasoning-bootstrapping pipeline

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-operative critic mechanism for error prevention

S-GRPO strategy enhances feedback reliability

Reasoning-bootstrapping data collection pipeline

🔎 Similar Papers

No similar papers found.