Consistent Story Generation with Asymmetry Zigzag Sampling

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-image generation models achieve high single-image fidelity but suffer from significant limitations in cross-image thematic consistency—particularly character and object coherence—required for visual storytelling. This paper proposes a training-free diffusion sampling framework centered on an asymmetric Zigzag sampling strategy: it employs alternating asymmetric text prompts and enforces inter-image visual feature sharing in the latent space to enable dynamic consistency modeling during generation. Crucially, the method avoids fine-tuning or auxiliary networks, introducing structured consistency constraints solely at inference time. Evaluated on multiple story visualization benchmarks, it improves the Consistency Score by 23.6% over state-of-the-art methods. Qualitative analysis confirms substantial gains in narrative coherence. The approach establishes a novel zero-shot paradigm for cross-image consistency generation, offering a lightweight, plug-and-play solution for visual storytelling without architectural or training overhead.

Technology Category

Application Category

📝 Abstract
Text-to-image generation models have made significant progress in producing high-quality images from textual descriptions, yet they continue to struggle with maintaining subject consistency across multiple images, a fundamental requirement for visual storytelling. Existing methods attempt to address this by either fine-tuning models on large-scale story visualization datasets, which is resource-intensive, or by using training-free techniques that share information across generations, which still yield limited success. In this paper, we introduce a novel training-free sampling strategy called Zigzag Sampling with Asymmetric Prompts and Visual Sharing to enhance subject consistency in visual story generation. Our approach proposes a zigzag sampling mechanism that alternates between asymmetric prompting to retain subject characteristics, while a visual sharing module transfers visual cues across generated images to %further enforce consistency. Experimental results, based on both quantitative metrics and qualitative evaluations, demonstrate that our method significantly outperforms previous approaches in generating coherent and consistent visual stories. The code is available at https://github.com/Mingxiao-Li/Asymmetry-Zigzag-StoryDiffusion.
Problem

Research questions and friction points this paper is trying to address.

Maintaining subject consistency in multi-image generation
Resource-intensive fine-tuning for visual storytelling
Limited success of training-free consistency techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zigzag sampling for subject consistency
Asymmetric prompts retain subject characteristics
Visual sharing transfers cues across images
🔎 Similar Papers
No similar papers found.