GPA: Learning GUI Process Automation from Demonstrations

πŸ“… 2026-04-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the fragility of traditional robotic process automation (RPA) systems and the non-determinism of existing vision-language model–based GUI agents, which hinder reliable reproduction of complex workflows in enterprise settings. The authors propose a lightweight, general-purpose, fully local vision-driven GUI automation approach capable of high-fidelity task replay from a single demonstration. Key innovations include the integration of sequential Monte Carlo methods to enhance robustness in UI element localization, a readiness calibration mechanism to ensure deterministic execution, and a completely local architecture that preserves data privacy. Experimental results demonstrate that the proposed method outperforms Gemini 3 Pro (augmented with CUA tools) in success rate on long-horizon GUI tasks while achieving a tenfold improvement in execution speed.
πŸ“ Abstract
GUI Process Automation (GPA) is a lightweight but general vision-based Robotic Process Automation (RPA), which enables fast and stable process replay with only a single demo. Addressing the fragility of traditional RPA and the non-deterministic risks of current vision language model-based GUI agents, GPA introduces three core benefits: (1) Robustness via Sequential Monte Carlo-based localization to handle rescaling and detection uncertainty; (2) Deterministic and Reliability safeguarded by readiness calibration; and (3) Privacy through fast, fully local execution. This approach delivers the adaptability, robustness, and security required for enterprise workflows. It can also be used as an MCP/CLI tool by other agents with coding capabilities so that the agent only reasons and orchestrates while GPA handles the GUI execution. We conducted a pilot experiment to compare GPA with Gemini 3 Pro (with CUA tools) and found that GPA achieves higher success rate with 10 times faster execution speed in finishing long-horizon GUI tasks.
Problem

Research questions and friction points this paper is trying to address.

Robotic Process Automation
GUI Automation
Deterministic Execution
Vision-based Automation
Enterprise Workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

GUI Process Automation
Sequential Monte Carlo
Readiness Calibration
Vision-based RPA
Local Execution