PosterVerse: A Full-Workflow Framework for Commercial-Grade Poster Generation with HTML-Based Scalable Typography

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first end-to-end, commercial-grade framework for automatic Chinese poster generation, addressing limitations in design coherence, text rendering accuracy, and business adaptability that hinder existing methods from balancing high-density information with visual appeal. The framework comprises a three-stage pipeline: blueprint generation, background synthesis, and unified HTML-driven layout and text rendering. Key innovations include PosterDNA—the first HTML-formatted dataset of Chinese commercial posters—fine-tuned large language models (LLMs) for design element extraction, a tailored diffusion model for background generation, and a multimodal LLM (MLLM)-powered, extensible HTML rendering engine that effectively resolves challenges in rendering small-font, high-density text. Experiments demonstrate that the generated posters meet commercial standards in visual attractiveness, text alignment precision, and layout customizability, significantly outperforming current approaches.

Technology Category

Application Category

📝 Abstract
Commercial-grade poster design demands the seamless integration of aesthetic appeal with precise, informative content delivery. Current automated poster generation systems face significant limitations, including incomplete design workflows, poor text rendering accuracy, and insufficient flexibility for commercial applications. To address these challenges, we propose PosterVerse, a full-workflow, commercial-grade poster generation method that seamlessly automates the entire design process while delivering high-density and scalable text rendering. PosterVerse replicates professional design through three key stages: (1) blueprint creation using fine-tuned LLMs to extract key design elements from user requirements, (2) graphical background generation via customized diffusion models to create visually appealing imagery, and (3) unified layout-text rendering with an MLLM-powered HTML engine to guarantee high text accuracy and flexible customization. In addition, we introduce PosterDNA, a commercial-grade, HTML-based dataset tailored for training and validating poster design models. To the best of our knowledge, PosterDNA is the first Chinese poster generation dataset to introduce HTML typography files, enabling scalable text rendering and fundamentally solving the challenges of rendering small and high-density text. Experimental results demonstrate that PosterVerse consistently produces commercial-grade posters with appealing visuals, accurate text alignment, and customizable layouts, making it a promising solution for automating commercial poster design. The code and model are available at https://github.com/wuhaer/PosterVerse.
Problem

Research questions and friction points this paper is trying to address.

poster generation
text rendering
commercial-grade design
scalable typography
automated design workflow
Innovation

Methods, ideas, or system contributions that make the work stand out.

HTML-based typography
full-workflow poster generation
scalable text rendering
diffusion models
PosterDNA dataset
🔎 Similar Papers
No similar papers found.
Junle Liu
Junle Liu
South China University of Technology
AIGC
P
Peirong Zhang
South China University of Technology
Yuyi Zhang
Yuyi Zhang
South China University of Technology
Computer VisionDiffusionImage generationHandwritten Character RecognitionOCR
P
Pengyu Yan
South China University of Technology
H
Hui Zhou
Intsig Information Co., Ltd.
X
Xinyue Zhou
Intsig Information Co., Ltd.
F
Fengjun Guo
Intsig Information Co., Ltd.
Lianwen Jin
Lianwen Jin
Professor of Electronic and Information Engineering, South China University of Technology
Optical Character Recognition (OCR)Computer VisionDocument AIMultimodal LLMs