PromptRPA: Generating Robotic Process Automation on Smartphones from Textual Prompts

📅 2024-04-03
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Smartphone GUI automation has long suffered from high barriers to entry due to the need for expert scripting and workflow design. This paper proposes the first mobile-oriented, multi-agent Prompt-to-RPA framework that enables end-to-end generation and execution of robotic process automation (RPA) workflows directly from natural-language prompts—such as task goals or procedural descriptions—without manual coding. The framework integrates large language models (LLMs), GUI state perception, hierarchical action planning, and reinforcement-based feedback mechanisms to emulate human-like cognition: interpreting user intent, integrating external information, and closing the loop on device interaction. It further supports continual learning and knowledge evolution via user feedback. Evaluated on real-world smartphone tasks, the framework achieves a task success rate of 95.21%—up from 22.28%—and requires only 1.66 interventions on average per new task, substantially lowering the adoption barrier for mobile RPA.

Technology Category

Application Category

📝 Abstract
Robotic Process Automation (RPA) offers a valuable solution for efficiently automating tasks on the graphical user interface (GUI), by emulating human interactions, without modifying existing code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present PromptRPA, a system designed to comprehend various task-related textual prompts (e.g., goals, procedures), thereby generating and performing corresponding RPA tasks. PromptRPA incorporates a suite of intelligent agents that mimic human cognitive functions, specializing in interpreting user intent, managing external information for RPA generation, and executing operations on smartphones. The agents can learn from user feedback and continuously improve their performance based on the accumulated knowledge. Experimental results indicated a performance jump from a 22.28% success rate in the baseline to 95.21% with PromptRPA, requiring an average of 1.66 user interventions for each new task. PromptRPA presents promising applications in fields such as tutorial creation, smart assistance, and customer service.
Problem

Research questions and friction points this paper is trying to address.

Automates smartphone UI tasks from textual prompts
Reduces need for scripting and workflow expertise
Improves task success rate with minimal user intervention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automates smartphone UI tasks from text prompts
Uses intelligent agents to interpret user intent
Improves performance through user feedback learning
🔎 Similar Papers
No similar papers found.
T
Tian Huang
Department of Computer Science and Technology, Tsinghua University, China
C
Chun Yu
Department of Computer Science and Technology, Tsinghua University, China
Weinan Shi
Weinan Shi
Tsinghua University
HCI
Z
Zijian Peng
Department of Computer Science and Technology, Tsinghua University, China
D
David Yang
Department of Computer Science and Technology, Tsinghua University, China
W
Weiqi Sun
Department of Public Administration, Sichuan University, China
Yuanchun Shi
Yuanchun Shi
Professor
human computer interaction