π€ AI Summary
This work addresses the challenges of inter-frame inconsistency in character identity and attire, as well as monotonous pose variation, in comic-style image sequence generation. We propose a retrieval-augmented, region-controlled diffusion model. Our method integrates image retrieval, textβimage alignment, region-conditioned modeling, and diffusion model fine-tuning. Key contributions include: (1) a retrieval-based character matching module that aligns textual prompts with character appearance using reference images; and (2) a region-wise character feature injection mechanism enabling localized control over facial features, clothing, and other semantic parts. Evaluated on multi-frame comic generation, our approach significantly improves character consistency and pose diversity, achieving state-of-the-art narrative coherence and visual vividness.
π Abstract
We present RaCig, a novel system for generating comic-style image sequences with consistent characters and expressive gestures. RaCig addresses two key challenges: (1) maintaining character identity and costume consistency across frames, and (2) producing diverse and vivid character gestures. Our approach integrates a retrieval-based character assignment module, which aligns characters in textual prompts with reference images, and a regional character injection mechanism that embeds character features into specified image regions. Experimental results demonstrate that RaCig effectively generates engaging comic narratives with coherent characters and dynamic interactions. The source code will be publicly available to support further research in this area.