AutoPresent: Designing Structured Visuals from Scratch

📅 2025-01-01

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the problem of natural language–driven automatic slide generation. Methodologically, it proposes a program-generation–first paradigm and introduces SlidesBench—the first sliding benchmark for this task (7K training / 585 test samples). Based on Llama-8B, the AutoPresent model is developed via instruction tuning and Python code generation, supporting LaTeX, Manim, and other backends to jointly optimize content structuring and visual design. A dual-path evaluation framework—combining reference-based and reference-free metrics—is introduced, along with an iterative self-refinement mechanism. Experiments show that AutoPresent matches GPT-4o in performance; program-based generation significantly outperforms end-to-end image generation; and user studies confirm a 32% improvement in design quality attributable to self-refinement. The core contributions are: (1) the first dedicated slide-generation benchmark (SlidesBench), (2) the program-generation–first paradigm, and (3) an efficient self-optimizing framework enabling iterative refinement.

Technology Category

Application Category

📝 Abstract

Designing structured visuals such as presentation slides is essential for communicative needs, necessitating both content creation and visual planning skills. In this work, we tackle the challenge of automated slide generation, where models produce slide presentations from natural language (NL) instructions. We first introduce the SlidesBench benchmark, the first benchmark for slide generation with 7k training and 585 testing examples derived from 310 slide decks across 10 domains. SlidesBench supports evaluations that are (i)reference-based to measure similarity to a target slide, and (ii)reference-free to measure the design quality of generated slides alone. We benchmark end-to-end image generation and program generation methods with a variety of models, and find that programmatic methods produce higher-quality slides in user-interactable formats. Built on the success of program generation, we create AutoPresent, an 8B Llama-based model trained on 7k pairs of instructions paired with code for slide generation, and achieve results comparable to the closed-source model GPT-4o. We further explore iterative design refinement where the model is tasked to self-refine its own output, and we found that this process improves the slide's quality. We hope that our work will provide a basis for future work on generating structured visuals.

Problem

Research questions and friction points this paper is trying to address.

Automated slide generation from natural language instructions

Creating structured visuals requiring content and design skills

Developing benchmark and models for quality slide production

Innovation

Methods, ideas, or system contributions that make the work stand out.

Programmatic slide generation from natural language

Self-refining iterative design improvement process

Open-source model matching closed-source performance

🔎 Similar Papers

Leveraging Foundation Models for Crafting Narrative Visualization: A Survey