PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address two key challenges in high-quality Chinese product poster generation for e-commerce—imprecise text rendering (especially for large-character-set Chinese, exceeding 10,000 glyphs) and low fidelity of product主体 preservation—this paper proposes an end-to-end co-generative framework. Methodologically, we introduce TextRenderNet, the first diffusion-based, character-level visual representation network with discriminative feature control for accurate text rendering; SceneGenNet, integrating inpainting-driven scene generation with subject-fidelity feedback learning; and a two-stage decoupled training strategy. Our contribution is the first joint optimization of high-precision, controllable text rendering for complex writing systems and structural-aware subject fidelity. Experiments demonstrate a 90.3% Chinese text rendering accuracy and a 27.5% improvement in product structural fidelity, significantly outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Product posters, which integrate subject, scene, and text, are crucial promotional tools for attracting customers. Creating such posters using modern image generation methods is valuable, while the main challenge lies in accurately rendering text, especially for complex writing systems like Chinese, which contains over 10,000 individual characters. In this work, we identify the key to precise text rendering as constructing a character-discriminative visual feature as a control signal. Based on this insight, we propose a robust character-wise representation as control and we develop TextRenderNet, which achieves a high text rendering accuracy of over 90%. Another challenge in poster generation is maintaining the fidelity of user-specific products. We address this by introducing SceneGenNet, an inpainting-based model, and propose subject fidelity feedback learning to further enhance fidelity. Based on TextRenderNet and SceneGenNet, we present PosterMaker, an end-to-end generation framework. To optimize PosterMaker efficiently, we implement a two-stage training strategy that decouples text rendering and background generation learning. Experimental results show that PosterMaker outperforms existing baselines by a remarkable margin, which demonstrates its effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Accurate text rendering in complex writing systems like Chinese
Maintaining fidelity of user-specific products in posters
End-to-end framework for high-quality product poster generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs character-discriminative visual features
Uses TextRenderNet for 90% text accuracy
Employs SceneGenNet for product fidelity
Y
Yifan Gao
University of Science and Technology of China, Taobao & Tmall Group of Alibaba
Zihang Lin
Zihang Lin
Sun Yat-sen University, Master Student
Computer VisionDeep Learning
C
Chuanbin Liu
University of Science and Technology of China
M
Min Zhou
Taobao & Tmall Group of Alibaba
Tiezheng Ge
Tiezheng Ge
Senior staff algorithm engineer, Alimama, Alibaba Group
Computer VisionAIGCRecommender Systems
B
Bo Zheng
Taobao & Tmall Group of Alibaba
H
Hongtao Xie
University of Science and Technology of China