🤖 AI Summary
Prior work on large language models (LLMs) for presentation generation largely overlooks the high-frequency need for efficient editing of existing PowerPoint slides. This paper introduces the first LLM-driven intelligent agent for real-time, conversational PowerPoint editing. Our approach employs a two-layer architecture—LLM-based task planning followed by Python COM automation—to transcend rigid, predefined operations and enable context-aware, fine-grained, natural-language-driven editing. To rigorously evaluate such agents, we construct TSBench, the first real-world, human-annotated benchmark for slide editing, comprising 379 diverse, instruction-grounded editing tasks. Experiments demonstrate that our method significantly outperforms baselines in execution success rate, instruction fidelity, and editing efficiency. Both the implementation code and TSBench are publicly released.
📝 Abstract
Existing research on large language models (LLMs) for PowerPoint predominantly focuses on slide generation, overlooking the common yet tedious task of editing existing slides. We introduce Talk-to-Your-Slides, an LLM-powered agent that directly edits slides within active PowerPoint sessions through COM communication. Our system employs a two-level approach: (1) high-level processing where an LLM agent interprets instructions and formulates editing plans, and (2) low-level execution where Python scripts directly manipulate PowerPoint objects. Unlike previous methods relying on predefined operations, our approach enables more flexible and contextually-aware editing. To facilitate evaluation, we present TSBench, a human-annotated dataset of 379 diverse editing instructions with corresponding slide variations. Experimental results demonstrate that Talk-to-Your-Slides significantly outperforms baseline methods in execution success rate, instruction fidelity, and editing efficiency. Our code and benchmark are available at https://anonymous.4open.science/r/talk-to-your-slides/