PosterSum: A Multimodal Benchmark for Scientific Poster Summarization

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing multimodal large language models (MLLMs) exhibit limited performance on image-text summarization for scientific posters—complex documents featuring dense text, intricate layouts, tables, and figures. Method: We introduce PosterSum, the first dedicated benchmark comprising 16,305 conference poster images paired with their corresponding paper abstracts, and propose Segment&Summarize—a hierarchical framework that first integrates OCR and region segmentation to localize key visual units, then jointly models them via a visual encoder and LLM to generate structured summaries. Contribution/Results: Our work provides the first systematic analysis of MLLM bottlenecks in scientific poster understanding. On PosterSum, our method achieves a +3.14 ROUGE-L improvement over state-of-the-art multimodal models, establishing a new benchmark, methodology, and paradigm for intelligent scientific document parsing.

Technology Category

Application Category

📝 Abstract

Generating accurate and concise textual summaries from multimodal documents is challenging, especially when dealing with visually complex content like scientific posters. We introduce PosterSum, a novel benchmark to advance the development of vision-language models that can understand and summarize scientific posters into research paper abstracts. Our dataset contains 16,305 conference posters paired with their corresponding abstracts as summaries. Each poster is provided in image format and presents diverse visual understanding challenges, such as complex layouts, dense text regions, tables, and figures. We benchmark state-of-the-art Multimodal Large Language Models (MLLMs) on PosterSum and demonstrate that they struggle to accurately interpret and summarize scientific posters. We propose Segment&Summarize, a hierarchical method that outperforms current MLLMs on automated metrics, achieving a 3.14% gain in ROUGE-L. This will serve as a starting point for future research on poster summarization.

Problem

Research questions and friction points this paper is trying to address.

Summarize scientific posters into abstracts

Address complex multimodal content understanding

Improve accuracy of vision-language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Large Language Models

Hierarchical Segment & Summarize

Scientific Poster Summarization

🔎 Similar Papers

No similar papers found.