Story2Board: A Training-Free Approach for Expressive Storyboard Generation

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Existing methods for storyboard generation overemphasize subject identity while neglecting critical visual narrative elements—such as spatial composition, background evolution, and narrative pacing. To address this, we propose a training-free, expressive storyboard generation framework that jointly controls character consistency, scene layout, and background progression to ensure cross-panel visual narrative coherence. Our key innovations are two lightweight consistency mechanisms: Latent Panel Anchoring and Reciprocal Attention Value Mixing—enabling, for the first time, cross-panel consistency of characters and scenes without fine-tuning diffusion models. The method leverages off-the-shelf language models to parse narratives and generate panel-specific prompts, then synthesizes images via diffusion models augmented with feature fusion techniques. Evaluated on our newly constructed Rich Storyboard Benchmark, our approach significantly outperforms baselines in dynamism, consistency, and narrative appeal, with particularly notable gains in Scene Diversity.

Technology Category

Application Category

📝 Abstract

We present Story2Board, a training-free framework for expressive storyboard generation from natural language. Existing methods narrowly focus on subject identity, overlooking key aspects of visual storytelling such as spatial composition, background evolution, and narrative pacing. To address this, we introduce a lightweight consistency framework composed of two components: Latent Panel Anchoring, which preserves a shared character reference across panels, and Reciprocal Attention Value Mixing, which softly blends visual features between token pairs with strong reciprocal attention. Together, these mechanisms enhance coherence without architectural changes or fine-tuning, enabling state-of-the-art diffusion models to generate visually diverse yet consistent storyboards. To structure generation, we use an off-the-shelf language model to convert free-form stories into grounded panel-level prompts. To evaluate, we propose the Rich Storyboard Benchmark, a suite of open-domain narratives designed to assess layout diversity and background-grounded storytelling, in addition to consistency. We also introduce a new Scene Diversity metric that quantifies spatial and pose variation across storyboards. Our qualitative and quantitative results, as well as a user study, show that Story2Board produces more dynamic, coherent, and narratively engaging storyboards than existing baselines.

Problem

Research questions and friction points this paper is trying to address.

Generates expressive storyboards without training

Addresses lack of visual storytelling aspects

Ensures coherence and diversity in storyboards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework for storyboard generation

Latent Panel Anchoring for character consistency

Reciprocal Attention Value Mixing for feature blending

🔎 Similar Papers

ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context