A Benchmark and Multi-Agent System for Instruction-driven Cinematic Video Compilation

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

233K/year
🤖 AI Summary
Existing video editing methods are largely confined to predefined tasks and lack the capability to automatically generate high-quality, cinematic short videos guided by user instructions, as well as a unified evaluation benchmark. To address this gap, this work introduces CineBench—the first instruction-driven benchmark for cinematic video editing—and CineAgents, a multi-agent system that adopts a “design-synthesize” paradigm. By performing script reverse engineering to construct hierarchical narrative memory and integrating iterative narrative planning, the proposed approach generates coherent editing scripts. It effectively mitigates context collapse and temporal fragmentation, significantly outperforming existing methods in narrative coherence and logical consistency, thereby demonstrating its efficacy in generating high-quality cinematic short videos from user instructions.

Technology Category

Application Category

📝 Abstract
The surging demand for adapting long-form cinematic content into short videos has motivated the need for versatile automatic video compilation systems. However, existing compilation methods are limited to predefined tasks, and the community lacks a comprehensive benchmark to evaluate the cinematic compilation. To address this, we introduce CineBench, the first benchmark for instruction-driven cinematic video compilation, featuring diverse user instructions and high-quality ground-truth compilations annotated by professional editors. To overcome contextual collapse and temporal fragmentation, we present CineAgents, a multi-agent system that reformulates cinematic video compilation into ``design-and-compose'' paradigm. CineAgents performs script reverse-engineering to construct a hierarchical narrative memory to provide multi-level context and employs an iterative narrative planning process that refines a creative blueprint into a final compiled script. Extensive experiments demonstrate that CineAgents significantly outperforms existing methods, generating compilations with superior narrative coherence and logical coherence.
Problem

Research questions and friction points this paper is trying to address.

instruction-driven video compilation
cinematic video editing
video summarization benchmark
narrative coherence
automatic video editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

instruction-driven video compilation
multi-agent system
narrative memory
iterative narrative planning
CineBench
🔎 Similar Papers