🤖 AI Summary
Smartphone GUI automation has long suffered from high barriers to entry due to the need for expert scripting and workflow design. This paper proposes the first mobile-oriented, multi-agent Prompt-to-RPA framework that enables end-to-end generation and execution of robotic process automation (RPA) workflows directly from natural-language prompts—such as task goals or procedural descriptions—without manual coding. The framework integrates large language models (LLMs), GUI state perception, hierarchical action planning, and reinforcement-based feedback mechanisms to emulate human-like cognition: interpreting user intent, integrating external information, and closing the loop on device interaction. It further supports continual learning and knowledge evolution via user feedback. Evaluated on real-world smartphone tasks, the framework achieves a task success rate of 95.21%—up from 22.28%—and requires only 1.66 interventions on average per new task, substantially lowering the adoption barrier for mobile RPA.
📝 Abstract
Robotic Process Automation (RPA) offers a valuable solution for efficiently automating tasks on the graphical user interface (GUI), by emulating human interactions, without modifying existing code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present PromptRPA, a system designed to comprehend various task-related textual prompts (e.g., goals, procedures), thereby generating and performing corresponding RPA tasks. PromptRPA incorporates a suite of intelligent agents that mimic human cognitive functions, specializing in interpreting user intent, managing external information for RPA generation, and executing operations on smartphones. The agents can learn from user feedback and continuously improve their performance based on the accumulated knowledge. Experimental results indicated a performance jump from a 22.28% success rate in the baseline to 95.21% with PromptRPA, requiring an average of 1.66 user interventions for each new task. PromptRPA presents promising applications in fields such as tutorial creation, smart assistance, and customer service.