🤖 AI Summary
This work addresses key challenges in high-quality creative poster generation—namely, text rendering distortion, disconnection between artistic content and layout, and stylistic inconsistency—by proposing the first end-to-end unified generation framework. Methodologically, it introduces: (1) a four-stage cascaded optimization pipeline—text rendering refinement, region-aware fine-tuning, aesthetics-text reinforcement learning, and joint vision-language feedback refinement; (2) an automated data synthesis pipeline leveraging Text-Render-2M and HQ-Poster100K for multi-stage training; and (3) best-of-n preference optimization with region-aware supervision. Experiments demonstrate that the method significantly outperforms leading open-source baselines in text fidelity, layout coherence, and overall aesthetic quality, approaching the performance of commercial state-of-the-art systems.
📝 Abstract
Generating aesthetic posters is more challenging than simple design images: it requires not only precise text rendering but also the seamless integration of abstract artistic content, striking layouts, and overall stylistic harmony. To address this, we propose PosterCraft, a unified framework that abandons prior modular pipelines and rigid, predefined layouts, allowing the model to freely explore coherent, visually compelling compositions. PosterCraft employs a carefully designed, cascaded workflow to optimize the generation of high-aesthetic posters: (i) large-scale text-rendering optimization on our newly introduced Text-Render-2M dataset; (ii) region-aware supervised fine-tuning on HQ-Poster100K; (iii) aesthetic-text-reinforcement learning via best-of-n preference optimization; and (iv) joint vision-language feedback refinement. Each stage is supported by a fully automated data-construction pipeline tailored to its specific needs, enabling robust training without complex architectural modifications. Evaluated on multiple experiments, PosterCraft significantly outperforms open-source baselines in rendering accuracy, layout coherence, and overall visual appeal-approaching the quality of SOTA commercial systems. Our code, models, and datasets can be found in the Project page: https://ephemeral182.github.io/PosterCraft