ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context

📅 2024-07-13

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Existing autoregressive visual storytelling methods suffer from high memory overhead, slow inference, and weak contextual modeling, resulting in poor consistency in characters and scenes across image sequences. To address these limitations, we propose an efficient generative framework tailored for visual narrative synthesis. Our approach introduces a spatially enhanced temporal attention mechanism for fine-grained spatiotemporal modeling; a Storyline Contextualizer to capture global narrative structure; and a StoryFlow Adapter to explicitly model dynamic scene evolution. Furthermore, we adopt a multi-stage generation architecture integrating diffusion or VAE-based components to improve controllability and fidelity. Evaluated on the PororoSV and FlintstonesSV benchmarks, our method achieves a 23.6% reduction in FID and an 18.4% improvement in CLIP-Score on both story visualization and continuation tasks, significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Visual storytelling involves generating a sequence of coherent frames from a textual storyline while maintaining consistency in characters and scenes. Existing autoregressive methods, which rely on previous frame-sentence pairs, struggle with high memory usage, slow generation speeds, and limited context integration. To address these issues, we propose ContextualStory, a novel framework designed to generate coherent story frames and extend frames for story continuation. ContextualStory utilizes Spatially-Enhanced Temporal Attention to capture spatial and temporal dependencies, handling significant character movements effectively. Additionally, we introduces a Storyline Contextualizer to enrich context in storyline embedding and a StoryFlow Adapter to measure scene changes between frames for guiding model. Extensive experiments on PororoSV and FlintstonesSV benchmarks demonstrate that ContextualStory significantly outperforms existing methods in both story visualization and story continuation.

Problem

Research questions and friction points this paper is trying to address.

Improves visual storytelling coherence

Reduces memory usage and speeds

Enhances spatial and temporal context

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatially-Enhanced Temporal Attention

Storyline Contextualizer

StoryFlow Adapter

🔎 Similar Papers

No similar papers found.