AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
This work addresses the high inference overhead of repeatedly invoking large language models (LLMs) in repetitive GUI automation tasks, contrasting with traditional robotic process automation (RPA), which is efficient but requires extensive manual scripting. To bridge this gap, the authors propose a two-stage Translator-Builder framework: first, it automatically distills reusable and robust RPA functions from interaction traces of ReAct-style LLM agents; then, it iteratively refines these functions through execution validation and an LLM fallback mechanism. Integrating retrieval-augmented generation, code synthesis, and hybrid verification strategies, the approach generates generalizable RPA functions across diverse GUI environments. Empirical results demonstrate that executing similar tasks with the synthesized functions reduces token consumption by 82%–96%, substantially enhancing automation efficiency and code reusability.
📝 Abstract
Large Language Model (LLM) based agents have demonstrated proficiency in multi-step interactions with graphical user interfaces (GUIs). While most research focuses on improving single-task performance, practical scenarios often involve repetitive GUI tasks for which invoking LLM reasoning repeatedly, i.e., the ReAct paradigm, is inefficient. Prior to LLMs, traditional Robotic Process Automation (RPA) offers runtime efficiency but demands significant manual effort to develop and maintain. To bridge this gap, we propose AutoRPA, a framework that automatically distills the decision logic of ReAct-style agents into robust RPA functions. AutoRPA introduces two core innovations: (1) A translator-builder pipeline, where a translator agent converts hard-coded ReAct actions into soft-coded procedures, and a builder agent synthesizes robust RPA functions via retrieval-augmented generation over multiple trajectories; (2) A hybrid repair strategy during code verification, combining RPA execution with ReAct-based fallback for iterative refinement. Experiments across multiple GUI environments demonstrate that RPA functions generated by AutoRPA successfully solve similar tasks while reducing token usage by 82% to 96%, significantly improving runtime efficiency and reusability.
Problem

Research questions and friction points this paper is trying to address.

GUI Automation
Large Language Model
Robotic Process Automation
Code Synthesis
ReAct Paradigm
Innovation

Methods, ideas, or system contributions that make the work stand out.

AutoRPA
LLM-driven code synthesis
Robotic Process Automation
retrieval-augmented generation
hybrid repair strategy
Minghao Chen
Minghao Chen
Hangzhou Dianzi University
Deep LearningDomain AdaptationVision and LanguageLLM Agents
X
Xinyi Hu
Zhejiang Key Laboratory of Space Information Sensing and Transmission, School of Computer Science, Hangzhou Dianzi University, China
Z
Zhou Yu
Zhejiang Key Laboratory of Space Information Sensing and Transmission, School of Computer Science, Hangzhou Dianzi University, China
Y
Yufei Yin
Zhejiang Key Laboratory of Space Information Sensing and Transmission, School of Computer Science, Hangzhou Dianzi University, China