🤖 AI Summary
Existing approaches to automated slide updating are constrained by fixed templates, making them ill-suited for dynamically updating user-customized, analytical presentations. To address this limitation, this work proposes SlideAgent, a novel framework that formalizes—for the first time—the task of dynamic slide updating tailored to user-defined templates. SlideAgent integrates multimodal slide parsing, natural language instruction understanding, and tool-augmented reasoning—supporting updates to tables, charts, and textual conclusions—while preserving the original layout and visual style. The authors introduce DynaSlide, a large-scale benchmark comprising 20,036 real-world instruction-execution triplets, along with end-to-end and component-level evaluation protocols. Experimental results demonstrate that SlideAgent establishes a strong baseline on this benchmark, confirming the effectiveness and feasibility of the proposed approach.
📝 Abstract
Presentation slides are a primary medium for data-driven reporting, yet keeping complex, analytics-style decks up to date remains labor-intensive. Existing automation methods mostly follow fixed template filling and cannot support dynamic updates for diverse, user-authored slide decks. We therefore define "Dynamic Slide Update via Natural Language Instructions on User-provided Templates" and introduce DynaSlide, a large-scale benchmark with 20,036 real-world instruction-execution triples (source slide, user instruction, target slide) grounded in a shared external database and built from business reporting slides under bring-your-own-template (BYO-template) conditions. To tackle this task, we propose SlideAgent, an agent-based framework that combines multimodal slide parsing, natural language instruction grounding, and tool-augmented reasoning for tables, charts, and textual conclusions. SlideAgent updates content while preserving layout and style, providing a strong reference baseline on DynaSlide. We further design end-to-end and component-level evaluation protocols that reveal key challenges and opportunities for future research. The dataset and code are available at https://github.com/XiaoZhou2024/SlideAgent.