🤖 AI Summary
Existing text-to-video models struggle to generate minute-long, multi-shot coherent narrative videos, exhibiting a “narrative gap.” This work proposes an end-to-end diffusion-based framework that ensures semantic and visual consistency from the first to the final shot via global scene modeling. Our key contributions are: (1) Window Cross-Attention, enabling fine-grained alignment between textual prompts and individual shots; and (2) Sparse Inter-Shot Self-Attention, which enforces long-range temporal coherence while improving computational efficiency through sparsified cross-shot attention. The architecture naturally emergently acquires persistent memory of characters and scenes, and demonstrates implicit understanding of cinematic techniques—including camera motion and editing. Experiments show our method achieves state-of-the-art narrative coherence and is the first to generate minute-scale, multi-shot videos with expressive cinematic qualities, marking a substantive advance toward automated film production.
📝 Abstract
State-of-the-art text-to-video models excel at generating isolated clips but fall short of creating the coherent, multi-shot narratives, which are the essence of storytelling. We bridge this"narrative gap"with HoloCine, a model that generates entire scenes holistically to ensure global consistency from the first shot to the last. Our architecture achieves precise directorial control through a Window Cross-Attention mechanism that localizes text prompts to specific shots, while a Sparse Inter-Shot Self-Attention pattern (dense within shots but sparse between them) ensures the efficiency required for minute-scale generation. Beyond setting a new state-of-the-art in narrative coherence, HoloCine develops remarkable emergent abilities: a persistent memory for characters and scenes, and an intuitive grasp of cinematic techniques. Our work marks a pivotal shift from clip synthesis towards automated filmmaking, making end-to-end cinematic creation a tangible future. Our code is available at: https://holo-cine.github.io/.