PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing automated graphic design methods suffer from insufficient geometric precision in layout generation, limited layer-level controllability, and poor support for iterative editing—hindering adoption in professional design workflows. To address these limitations, we propose the first iterative, layer-controllable intelligent layout framework tailored to real-world design practices. Our approach synergistically integrates large vision-language models with generative models via a three-stage progressive training strategy, incorporating perturbation-aware supervised fine-tuning and dual-path reinforcement learning—aligning layouts with visual realism and optimizing for aesthetic quality. The framework significantly improves both geometric accuracy and aesthetic fidelity, outperforming state-of-the-art methods across multiple benchmarks. Crucially, it enables high-precision, single-element editing while preserving global visual coherence and structural consistency—thereby bridging the gap between automated generation and professional iterative design requirements.

Technology Category

Application Category

📝 Abstract
Graphic design forms the cornerstone of modern visual communication, serving as a vital medium for promoting cultural and commercial events. Recent advances have explored automating this process using Large Multimodal Models (LMMs), yet existing methods often produce geometrically inaccurate layouts and lack the iterative, layer-specific editing required in professional workflows. To address these limitations, we present PosterCopilot, a framework that advances layout reasoning and controllable editing for professional graphic design. Specifically, we introduce a progressive three-stage training strategy that equips LMMs with geometric understanding and aesthetic reasoning for layout design, consisting of Perturbed Supervised Fine-Tuning, Reinforcement Learning for Visual-Reality Alignment, and Reinforcement Learning from Aesthetic Feedback. Furthermore, we develop a complete workflow that couples the trained LMM-based design model with generative models, enabling layer-controllable, iterative editing for precise element refinement while maintaining global visual consistency. Extensive experiments demonstrate that PosterCopilot achieves geometrically accurate and aesthetically superior layouts, offering unprecedented controllability for professional iterative design.
Problem

Research questions and friction points this paper is trying to address.

Automates professional graphic design with accurate layouts
Enables layer-specific iterative editing in design workflows
Ensures geometric and aesthetic quality in generated designs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive three-stage training strategy for LMMs
Coupling LMM with generative models for layer editing
Achieving geometric accuracy and aesthetic layout control
🔎 Similar Papers
No similar papers found.
J
Jiazhe Wei
PRLab, Nanjing University
K
Ken Li
PRLab, Nanjing University
T
Tianyu Lao
LibLib.ai
Haofan Wang
Haofan Wang
Lovart AI, InstantX, Carnegie Mellon University
AI DesignImage GenerationGenerative AI
L
Liang Wang
PRLab, Nanjing University; Institute of Automation, Chinese Academy of Sciences
Caifeng Shan
Caifeng Shan
Philips Research
Computer VisionPattern RecognitionMachine LearningImage/Video Analysis
C
Chenyang Si
PRLab, Nanjing University