PASS: Presentation Automation for Slide Generation and Speech

📅 2025-01-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing automated presentation generation methods heavily rely on research papers, limiting generalizability to generic Word documents, and lack end-to-end joint generation of slides and speech—resulting in time-consuming manual authoring, incoherent narration, and poor immersion. This paper proposes the first end-to-end framework for joint slide and spoken commentary generation from generic documents. It integrates information extraction, structured summarization, multimodal content planning, and text-to-speech (TTS) synthesis, driven by a fine-tuned large language model (LLM) for document understanding and narrative generation. We also introduce the first automated evaluation metric encompassing three dimensions: relevance, coherence, and redundancy. Experiments demonstrate significant quality improvements, with strong agreement between automatic scores and human ratings (Spearman’s ρ > 0.92). The code and dataset are publicly released to ensure reproducibility.

Technology Category

Application Category

📝 Abstract
In today's fast-paced world, effective presentations have become an essential tool for communication in both online and offline meetings. The crafting of a compelling presentation requires significant time and effort, from gathering key insights to designing slides that convey information clearly and concisely. However, despite the wealth of resources available, people often find themselves manually extracting crucial points, analyzing data, and organizing content in a way that ensures clarity and impact. Furthermore, a successful presentation goes beyond just the slides; it demands rehearsal and the ability to weave a captivating narrative to fully engage the audience. Although there has been some exploration of automating document-to-slide generation, existing research is largely centered on converting research papers. In addition, automation of the delivery of these presentations has yet to be addressed. We introduce PASS, a pipeline used to generate slides from general Word documents, going beyond just research papers, which also automates the oral delivery of the generated slides. PASS analyzes user documents to create a dynamic, engaging presentation with an AI-generated voice. Additionally, we developed an LLM-based evaluation metric to assess our pipeline across three critical dimensions of presentations: relevance, coherence, and redundancy. The data and codes are available at https://github.com/AggarwalTushar/PASS.
Problem

Research questions and friction points this paper is trying to address.

Automatic Presentation Generation
Document Analysis
Data Visualization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated Presentation Generation
AI Voiceover
Quality Evaluation Tool
🔎 Similar Papers
No similar papers found.