SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal large language models (MLLMs) are evaluated on video understanding tasks using isolated frames or single videos, failing to capture continuous, narrative-driven sequences prevalent in real-world scenarios. Method: We introduce SeriesBench, the first multi-task benchmark for episode-level narrative understanding, comprising 105 TV episodes and 28 fine-grained narrative tasks. It features a novel long-horizon narrative annotation scheme and a full-information task auto-conversion mechanism. We further propose PC-DCoT, a reasoning framework that explicitly models plot-level causal chains and dynamic character interactions. Contribution/Results: Experiments reveal significant bottlenecks in current MLLMs’ episode-level narrative comprehension. PC-DCoT boosts average accuracy of mainstream models on SeriesBench by 19.7%. The benchmark is publicly released and accepted at CVPR 2025.

Technology Category

Application Category

📝 Abstract
With the rapid development of Multi-modal Large Language Models (MLLMs), an increasing number of benchmarks have been established to evaluate the video understanding capabilities of these models. However, these benchmarks focus on extbf{standalone} videos and mainly assess ``visual elements'' like human actions and object states. In reality, contemporary videos often encompass complex and continuous narratives, typically presented as a extbf{series}. To address this challenge, we propose extbf{SeriesBench}, a benchmark consisting of 105 carefully curated narrative-driven series, covering 28 specialized tasks that require deep narrative understanding. Specifically, we first select a diverse set of drama series spanning various genres. Then, we introduce a novel long-span narrative annotation method, combined with a full-information transformation approach to convert manual annotations into diverse task formats. To further enhance model capacity for detailed analysis of plot structures and character relationships within series, we propose a novel narrative reasoning framework, extbf{PC-DCoT}. Extensive results on extbf{SeriesBench} indicate that existing MLLMs still face significant challenges in understanding narrative-driven series, while extbf{PC-DCoT} enables these MLLMs to achieve performance improvements. Overall, our extbf{SeriesBench} and extbf{PC-DCoT} highlight the critical necessity of advancing model capabilities to understand narrative-driven series, guiding the future development of MLLMs. SeriesBench is publicly available at https://github.com/zackhxn/SeriesBench-CVPR2025.
Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs' narrative understanding in drama series
Addressing lack of benchmarks for continuous video narratives
Improving model analysis of plot structures and character relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces SeriesBench for narrative-driven series evaluation
Develops long-span narrative annotation method
Proposes PC-DCoT framework for narrative reasoning
🔎 Similar Papers
No similar papers found.
C
Chenkai Zhang
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University; Hangzhou Innovation Institute, Beihang University, Hangzhou China
Y
Yiming Lei
State Key Laboratory of Virtual Reality Technology and Systems, Beihang University
Z
Zeming Liu
School of Computer Science and Engineering, Beihang University, Beijing, China
H
Haitao Leng
Kuaishou Technology
Shaoguo Liu
Shaoguo Liu
Alibaba Corporation
Maching LearningComputer Vision
T
Tingting Gao
Kuaishou Technology
Qingjie Liu
Qingjie Liu
Professor, School of Computer Science and Engineering, Beihang University
Computer Vision and Pattern Recognition
Yunhong Wang
Yunhong Wang
Professor, School of Computer Science and Engineering, Beihang University
BiometricsPattern RecognitionImage ProcessingComputer Vision