🤖 AI Summary
To address the high cost and scalability limitations of expert-authored Excel tutorials, this paper proposes the first end-to-end automated framework that generates functionally complete, executable Excel tutorials directly from natural language task descriptions. Methodologically, it employs a large language model (LLM)-driven execution agent that autonomously plans action sequences, executes operations in a real Excel environment, and concurrently produces structured documentation and video demonstrations—without requiring human-provided step-by-step instructions or exemplar materials. Key contributions include: (1) the first fully automated Excel tutorial generation system; (2) a hybrid LLM-human evaluation framework for comprehensive quality assessment; and (3) empirical results demonstrating an 8.5% improvement in task execution success rate, tutorial readability and pedagogical effectiveness on par with expert-authored counterparts, and a 20× reduction in generation time—thereby validating the feasibility of large-scale, high-quality tutorial production.
📝 Abstract
Excel is one of the most widely used productivity tools across domains, offering rich functionality but also overwhelming users with its complexity. This creates a persistent demand for tutorials to support effective usage. However, existing tutorials are manually authored by experts, require frequent updates after each software release, and incur substantial labor costs. Prior work has not achieved fully automated tutorial generation, since existing methods still depend on handcrafted operation sequences or example materials. In this paper, we present the first framework for automatically generating Excel tutorials directly from natural language task descriptions. Our framework first instantiates the task. Then a central component of this framework, Execution Agent, plans and executes the solution in Excel, and collects the intermediate artifacts required for tutorial construction. These artifacts are then transformed into both structured Excel documents and video demonstrations. To build a comprehensive tutorial corpus, we collected 1,559 task descriptions from real-world scenarios. In addition, we designed a systematic evaluation framework that integrates assessments from both large language models (LLMs) and human reviewers. Experimental results show that our framework improves task execution success rates by 8.5% over state-of-the-art baselines. Moreover, the generated tutorials demonstrate superior readability and instructional effectiveness, often approaching or surpassing expert-authored materials. Importantly, the automated pipeline eliminates manual labor and reduces time costs to 1/20 of expert authoring, making scalable and high-quality tutorial generation practical for the first time.