MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation

📅 2024-11-22

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Current video generation models struggle with generating long-duration, multi-scene, and narratively coherent videos, while high-quality benchmark datasets supporting cross-shot character ID consistency and precise audio-visual-text alignment remain scarce. To address this, we introduce the first film-grade hierarchical dataset specifically designed for long-video generation, featuring a three-tier annotation schema—“script-scene-frame”—that enables multi-scene narrative modeling, cross-shot character ID tracking, and fine-grained multimodal (visual/auditory/textual) alignment. This dataset fills a critical gap in evaluation and training benchmarks for long-video generation, systematically uncovering novel challenges such as character ID drift for the first time. It also provides a reproducible, quantitative evaluation protocol grounded in standardized metrics. The dataset is publicly released and actively maintained, establishing a foundational resource for developing and rigorously assessing next-generation long-video generation models.

Technology Category

Application Category

📝 Abstract

Recent advancements in video generation models, like Stable Video Diffusion, show promising results, but primarily focus on short, single-scene videos. These models struggle with generating long videos that involve multiple scenes, coherent narratives, and consistent characters. Furthermore, there is no publicly available dataset tailored for the analysis, evaluation, and training of long video generation models. In this paper, we present MovieBench: A Hierarchical Movie-Level Dataset for Long Video Generation, which addresses these challenges by providing unique contributions: (1) movie-length videos featuring rich, coherent storylines and multi-scene narratives, (2) consistency of character appearance and audio across scenes, and (3) hierarchical data structure contains high-level movie information and detailed shot-level descriptions. Experiments demonstrate that MovieBench brings some new insights and challenges, such as maintaining character ID consistency across multiple scenes for various characters. The dataset will be public and continuously maintained, aiming to advance the field of long video generation. Data can be found at: https://weijiawu.github.io/MovieBench/.

Problem

Research questions and friction points this paper is trying to address.

Lack of datasets for long video generation analysis

Difficulty in maintaining multi-scene narrative coherence

Challenges in consistent character appearance across scenes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical dataset for long video generation

Multi-scene narratives with consistent characters

High-level movie and shot-level descriptions

🔎 Similar Papers

MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies