π€ AI Summary
Existing end-to-end generative models struggle to precisely control layout geometry, visual references, and cross-panel consistency in comic generation. This work proposes an agent-based, multi-stage generation framework that decouples the creative pipeline into modular stagesβstory planning, character-scene anchoring, layout construction, reference-guided rendering, page composition, and text typesetting. By incorporating a story-paragraph memory mechanism and explicit intermediate representations, the approach enables editable control over layout, visual assets, and textual elements. The method significantly outperforms end-to-end baselines in layout fidelity, cross-panel consistency, and overall generation quality, while supporting flexible human intervention. This provides a controllable and efficient solution for generating long-form comics.
π Abstract
End-to-end manga generation is a structured visual storytelling task that requires story decomposition, recurring character and scene grounding, page layout design, panel rendering, page composition, and lettering. However, existing generative models often perform direct page synthesis, entangling these factors in a single visual output and limiting precise control over layout geometry, visual references, and cross-panel consistency. To address these limitations, we propose MangaFlow, an agentic framework for controllable long-form manga generation that decomposes manga creation into planning, grounding, layout construction, reference-conditioned rendering, composition, and text placement. By treating layout and visual references as explicit intermediate variables, MangaFlow enables both simple text-to-manga generation and more precise user-controlled manga creation. This design exposes layout, visual assets, and lettering as editable intermediate controls for refining panel geometry, references, and text placement. To support long-form consistency, MangaFlow introduces a story section memory that links section descriptions with corresponding character, scene, and object references for reuse across panels. We further present a meta-benchmark for evaluating layout controllability, visual consistency, and generation quality. Experiments show that MangaFlow improves layout adherence and cross-panel consistency over direct generation baselines while supporting flexible human control.