Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Academic poster generation faces three core challenges: long-document compression, joint text-image layout optimization, and visual-semantic alignment. This paper introduces PosterAgent—the first vision-closed, multi-agent framework specifically designed for academic poster generation. Built upon the lightweight open-source Qwen-2.5 model, it integrates a structured parser, a binary-tree–based reading-flow layout planner, and a VLM-driven Painter-Commenter rendering feedback mechanism, enabling editable PPTX output. We construct the first dedicated poster generation benchmark and propose a four-dimensional evaluation framework, including the novel PaperQuiz metric for content fidelity assessment. Experiments demonstrate that PosterAgent consistently outperforms the GPT-4o multi-agent baseline across content fidelity, layout合理性, and visual quality, while reducing token consumption by 87% and lowering per-generation cost to just $0.005. It supports end-to-end processing of papers up to 22 pages in length.

Technology Category

Application Category

📝 Abstract

Academic poster generation is a crucial yet challenging task in scientific communication, requiring the compression of long-context interleaved documents into a single, visually coherent page. To address this challenge, we introduce the first benchmark and metric suite for poster generation, which pairs recent conference papers with author-designed posters and evaluates outputs on (i)Visual Quality-semantic alignment with human posters, (ii)Textual Coherence-language fluency, (iii)Holistic Assessment-six fine-grained aesthetic and informational criteria scored by a VLM-as-judge, and notably (iv)PaperQuiz-the poster's ability to convey core paper content as measured by VLMs answering generated quizzes. Building on this benchmark, we propose PosterAgent, a top-down, visual-in-the-loop multi-agent pipeline: the (a)Parser distills the paper into a structured asset library; the (b)Planner aligns text-visual pairs into a binary-tree layout that preserves reading order and spatial balance; and the (c)Painter-Commenter loop refines each panel by executing rendering code and using VLM feedback to eliminate overflow and ensure alignment. In our comprehensive evaluation, we find that GPT-4o outputs-though visually appealing at first glance-often exhibit noisy text and poor PaperQuiz scores, and we find that reader engagement is the primary aesthetic bottleneck, as human-designed posters rely largely on visual semantics to convey meaning. Our fully open-source variants (e.g. based on the Qwen-2.5 series) outperform existing 4o-driven multi-agent systems across nearly all metrics, while using 87% fewer tokens. It transforms a 22-page paper into a finalized yet editable .pptx poster - all for just $0.005. These findings chart clear directions for the next generation of fully automated poster-generation models. The code and datasets are available at https://github.com/Paper2Poster/Paper2Poster.

Problem

Research questions and friction points this paper is trying to address.

Automating academic poster generation from scientific papers

Evaluating poster quality via multimodal metrics

Optimizing visual-textual coherence and reader engagement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces benchmark and metric suite for poster generation

Proposes top-down multi-agent pipeline PosterAgent

Uses VLM feedback to refine panel alignment

🔎 Similar Papers

No similar papers found.