Plot'n Polish: Zero-shot Story Visualization and Disentangled Editing with Text-to-Image Diffusion Models

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Current text-to-image diffusion models struggle to achieve cross-frame consistent, flexible, and fine-grained post-generation editing, limiting their practicality for narrative visualization. To address this, we propose the first zero-shot, fine-tuning-free hierarchical editing framework that jointly decouples semantic control (character/scenario/action), incorporates semantic layout guidance, enables latent-space directional editing, and enforces zero-shot inter-frame alignment—thereby supporting coherent story-level image sequence generation and multi-granularity post-editing. Our method preserves both temporal visual coherence and narrative consistency while enabling coarse-grained structural adjustments and fine-grained attribute modifications. Extensive evaluations on multiple benchmarks demonstrate significant improvements over state-of-the-art methods, achieving new SOTA performance in editing accuracy, inter-frame consistency, and controllability.

Technology Category

Application Category

📝 Abstract

Text-to-image diffusion models have demonstrated significant capabilities to generate diverse and detailed visuals in various domains, and story visualization is emerging as a particularly promising application. However, as their use in real-world creative domains increases, the need for providing enhanced control, refinement, and the ability to modify images post-generation in a consistent manner becomes an important challenge. Existing methods often lack the flexibility to apply fine or coarse edits while maintaining visual and narrative consistency across multiple frames, preventing creators from seamlessly crafting and refining their visual stories. To address these challenges, we introduce Plot'n Polish, a zero-shot framework that enables consistent story generation and provides fine-grained control over story visualizations at various levels of detail.

Problem

Research questions and friction points this paper is trying to address.

Zero-shot story visualization with diffusion models

Maintaining visual and narrative consistency across frames

Providing fine-grained control for image editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot framework for consistent story generation

Fine-grained control over story visualizations

Disentangled editing with text-to-image diffusion

🔎 Similar Papers

No similar papers found.