Talk to Your Slides: Efficient Slide Editing Agent with Large Language Models

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work on large language models (LLMs) for presentation generation largely overlooks the high-frequency need for efficient editing of existing PowerPoint slides. This paper introduces the first LLM-driven intelligent agent for real-time, conversational PowerPoint editing. Our approach employs a two-layer architecture—LLM-based task planning followed by Python COM automation—to transcend rigid, predefined operations and enable context-aware, fine-grained, natural-language-driven editing. To rigorously evaluate such agents, we construct TSBench, the first real-world, human-annotated benchmark for slide editing, comprising 379 diverse, instruction-grounded editing tasks. Experiments demonstrate that our method significantly outperforms baselines in execution success rate, instruction fidelity, and editing efficiency. Both the implementation code and TSBench are publicly released.

Technology Category

Application Category

📝 Abstract
Existing research on large language models (LLMs) for PowerPoint predominantly focuses on slide generation, overlooking the common yet tedious task of editing existing slides. We introduce Talk-to-Your-Slides, an LLM-powered agent that directly edits slides within active PowerPoint sessions through COM communication. Our system employs a two-level approach: (1) high-level processing where an LLM agent interprets instructions and formulates editing plans, and (2) low-level execution where Python scripts directly manipulate PowerPoint objects. Unlike previous methods relying on predefined operations, our approach enables more flexible and contextually-aware editing. To facilitate evaluation, we present TSBench, a human-annotated dataset of 379 diverse editing instructions with corresponding slide variations. Experimental results demonstrate that Talk-to-Your-Slides significantly outperforms baseline methods in execution success rate, instruction fidelity, and editing efficiency. Our code and benchmark are available at https://anonymous.4open.science/r/talk-to-your-slides/
Problem

Research questions and friction points this paper is trying to address.

Editing existing slides is tedious and overlooked in LLM research
Current methods lack flexibility for context-aware slide editing
No standardized benchmark exists for evaluating slide editing agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-powered agent edits slides via COM communication
Two-level approach: high-level planning and low-level execution
Python scripts directly manipulate PowerPoint objects
🔎 Similar Papers
No similar papers found.