🤖 AI Summary
To address the challenge that complex office tasks described in natural language are difficult for conventional AI planners to interpret, this paper proposes an end-to-end human–AI collaborative automation framework integrating large language models (LLMs) and symbolic planning. Methodologically, it pioneers a deep coupling of LLM-based semantic parsing with classical planning engines (e.g., FF/PDDL), executable task execution modules, and closed-loop state monitoring with real-time feedback—enabling a complete pipeline from natural language instruction to reliable action sequence generation, execution, and dynamic correction. Experiments demonstrate high success rates in automating diverse office tasks while substantially reducing user effort. The core contribution is the first office automation paradigm supporting four-stage synergistic workflow: semantic understanding, symbolic reasoning, real-time execution, and online feedback—thereby validating the feasibility and practicality of human–AI collaborative workflow automation.
📝 Abstract
Classical AI Planning techniques generate sequences of actions for complex tasks. However, they lack the ability to understand planning tasks when provided using natural language. The advent of Large Language Models (LLMs) has introduced novel capabilities in human-computer interaction. In the context of planning tasks, LLMs have shown to be particularly good in interpreting human intents among other uses. This paper introduces GenPlanX that integrates LLMs for natural language-based description of planning tasks, with a classical AI planning engine, alongside an execution and monitoring framework. We demonstrate the efficacy of GenPlanX in assisting users with office-related tasks, highlighting its potential to streamline workflows and enhance productivity through seamless human-AI collaboration.