SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Current text-to-image generation models struggle to ensure scene-level narrative coherence and cross-story consistency in multi-image storytelling, particularly lacking structured scene planning and long-term shared modeling. To address this, we propose the first scene-oriented story generation framework: (1) leveraging vision-language models (VLMs) for global–local collaborative scene planning, explicitly encoding spatial, temporal, and semantic constraints; and (2) introducing a long-horizon scene-shared attention mechanism within diffusion models—enabling cross-story scene consistency without additional training while preserving subject diversity. Experiments demonstrate significant improvements over state-of-the-art methods in both scene-level narrative coherence and visual consistency. Our approach establishes a scalable, training-free paradigm for consistent story generation, with direct applicability to artistic creation, film storyboarding, and game narrative design.

Technology Category

Application Category

📝 Abstract

Recent text-to-image models have revolutionized image generation, but they still struggle with maintaining concept consistency across generated images. While existing works focus on character consistency, they often overlook the crucial role of scenes in storytelling, which restricts their creativity in practice. This paper introduces scene-oriented story generation, addressing two key challenges: (i) scene planning, where current methods fail to ensure scene-level narrative coherence by relying solely on text descriptions, and (ii) scene consistency, which remains largely unexplored in terms of maintaining scene consistency across multiple stories. We propose SceneDecorator, a training-free framework that employs VLM-Guided Scene Planning to ensure narrative coherence across different scenes in a ``global-to-local'' manner, and Long-Term Scene-Sharing Attention to maintain long-term scene consistency and subject diversity across generated stories. Extensive experiments demonstrate the superior performance of SceneDecorator, highlighting its potential to unleash creativity in the fields of arts, films, and games.

Problem

Research questions and friction points this paper is trying to address.

Ensuring narrative coherence across different generated scenes

Maintaining long-term scene consistency in multiple stories

Addressing scene-level planning limitations in text-to-image models

Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM-Guided Scene Planning for narrative coherence

Long-Term Scene-Sharing Attention for consistency

Training-free framework ensuring subject diversity across stories

🔎 Similar Papers

ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context