DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing benchmarks struggle to comprehensively evaluate content faithfulness, structural coherence, layout rationality, and instruction-following capabilities in the automatic generation and multi-turn editing of academic slides. To address this gap, this work proposes DECKBench—the first standardized evaluation framework for this task—featuring a curated paper-to-slide paired dataset and simulated editing instructions. The framework introduces multidimensional metrics spanning both slide-level and holistic perspectives and includes a modular multi-agent baseline system comprising modules for paper parsing, slide planning, HTML rendering, and iterative editing. Experimental results demonstrate that DECKBench effectively reveals strengths, weaknesses, and failure modes of current approaches, offering actionable insights for advancing multi-agent systems in academic slide generation and editing.

Technology Category

Application Category

📝 Abstract

Automatically generating and iteratively editing academic slide decks requires more than document summarization. It demands faithful content selection, coherent slide organization, layout-aware rendering, and robust multi-turn instruction following. However, existing benchmarks and evaluation protocols do not adequately measure these challenges. To address this gap, we introduce the Deck Edits and Compliance Kit Benchmark (DECKBench), an evaluation framework for multi-agent slide generation and editing. DECKBench is built on a curated dataset of paper to slide pairs augmented with realistic, simulated editing instructions. Our evaluation protocol systematically assesses slide-level and deck-level fidelity, coherence, layout quality, and multi-turn instruction following. We further implement a modular multi-agent baseline system that decomposes the slide generation and editing task into paper parsing and summarization, slide planning, HTML creation, and iterative editing. Experimental results demonstrate that the proposed benchmark highlights strengths, exposes failure modes, and provides actionable insights for improving multi-agent slide generation and editing systems. Overall, this work establishes a standardized foundation for reproducible and comparable evaluation of academic presentation generation and editing. Code and data are publicly available at https://github.com/morgan-heisler/DeckBench .

Problem

Research questions and friction points this paper is trying to address.

academic slide generation

multi-agent frameworks

benchmarking

instruction following

slide editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent framework

slide generation

instruction following

benchmarking

layout-aware rendering

🔎 Similar Papers

System for systematic literature review using multiple AI agents: Concept and an empirical evaluation

2024-03-13arXiv.orgCitations: 25