SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current text-to-image generation models struggle to ensure scene-level narrative coherence and cross-story consistency in multi-image storytelling, particularly lacking structured scene planning and long-term shared modeling. To address this, we propose the first scene-oriented story generation framework: (1) leveraging vision-language models (VLMs) for global–local collaborative scene planning, explicitly encoding spatial, temporal, and semantic constraints; and (2) introducing a long-horizon scene-shared attention mechanism within diffusion models—enabling cross-story scene consistency without additional training while preserving subject diversity. Experiments demonstrate significant improvements over state-of-the-art methods in both scene-level narrative coherence and visual consistency. Our approach establishes a scalable, training-free paradigm for consistent story generation, with direct applicability to artistic creation, film storyboarding, and game narrative design.

Technology Category

Application Category

📝 Abstract
Recent text-to-image models have revolutionized image generation, but they still struggle with maintaining concept consistency across generated images. While existing works focus on character consistency, they often overlook the crucial role of scenes in storytelling, which restricts their creativity in practice. This paper introduces scene-oriented story generation, addressing two key challenges: (i) scene planning, where current methods fail to ensure scene-level narrative coherence by relying solely on text descriptions, and (ii) scene consistency, which remains largely unexplored in terms of maintaining scene consistency across multiple stories. We propose SceneDecorator, a training-free framework that employs VLM-Guided Scene Planning to ensure narrative coherence across different scenes in a ``global-to-local'' manner, and Long-Term Scene-Sharing Attention to maintain long-term scene consistency and subject diversity across generated stories. Extensive experiments demonstrate the superior performance of SceneDecorator, highlighting its potential to unleash creativity in the fields of arts, films, and games.
Problem

Research questions and friction points this paper is trying to address.

Ensuring narrative coherence across different generated scenes
Maintaining long-term scene consistency in multiple stories
Addressing scene-level planning limitations in text-to-image models
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM-Guided Scene Planning for narrative coherence
Long-Term Scene-Sharing Attention for consistency
Training-free framework ensuring subject diversity across stories
🔎 Similar Papers
No similar papers found.
Q
Quanjian Song
Monash University
Donghao Zhou
Donghao Zhou
The Chinese University of Hong Kong
Machine LearningComputer Vision
J
Jingyu Lin
Monash University
Fei Shen
Fei Shen
National University of Singapore
Controllable GenerationMultimodal Safety
J
Jiaze Wang
The Chinese University of Hong Kong
X
Xiaowei Hu
South China University of Technology
Cunjian Chen
Cunjian Chen
Monash University
Generative AIComputer VisionDeep Learning
P
Pheng-Ann Heng
The Chinese University of Hong Kong