SlideTailor: Personalized Presentation Slide Generation for Scientific Papers

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of significant user preference heterogeneity and insufficient personalization in academic presentation generation, this paper introduces the first conditional slide generation task grounded in implicit preferences—namely, exemplar slide pairs and visual templates. Methodologically, we propose a human-behavior-inspired agent framework that integrates multimodal prompt modeling with example-driven preference learning, and design a Chain-of-Speech mechanism for joint generation of spoken narration and editable, structured slides. Our approach uniquely enables implicit preference distillation and cross-template generalization. Evaluated on the first user-preference-aware benchmark, our method significantly outperforms existing baselines: generated slides better align with users’ stylistic preferences and communicative intent, while the unified speech-slide output natively supports downstream applications such as video-based presentations.

Technology Category

Application Category

📝 Abstract
Automatic presentation slide generation can greatly streamline content creation. However, since preferences of each user may vary, existing under-specified formulations often lead to suboptimal results that fail to align with individual user needs. We introduce a novel task that conditions paper-to-slides generation on user-specified preferences. We propose a human behavior-inspired agentic framework, SlideTailor, that progressively generates editable slides in a user-aligned manner. Instead of requiring users to write their preferences in detailed textual form, our system only asks for a paper-slides example pair and a visual template - natural and easy-to-provide artifacts that implicitly encode rich user preferences across content and visual style. Despite the implicit and unlabeled nature of these inputs, our framework effectively distills and generalizes the preferences to guide customized slide generation. We also introduce a novel chain-of-speech mechanism to align slide content with planned oral narration. Such a design significantly enhances the quality of generated slides and enables downstream applications like video presentations. To support this new task, we construct a benchmark dataset that captures diverse user preferences, with carefully designed interpretable metrics for robust evaluation. Extensive experiments demonstrate the effectiveness of our framework.
Problem

Research questions and friction points this paper is trying to address.

Generates personalized slides from scientific papers
Aligns slide content with user preferences and narration
Uses example pairs and templates to infer user styles
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates slides using user-provided example pairs and templates
Employs a chain-of-speech mechanism for narration-aligned content
Distills implicit preferences from unlabeled inputs for customization
🔎 Similar Papers
No similar papers found.
Wenzheng Zeng
Wenzheng Zeng
National University of Singapore
Computer Vision
M
Mingyu Ouyang
Department of Computer Science, National University of Singapore
L
Langyuan Cui
Department of Computer Science, National University of Singapore
Hwee Tou Ng
Hwee Tou Ng
Provost's Chair Professor of Computer Science, National University of Singapore
Natural Language ProcessingComputational Linguistics