🤖 AI Summary
Existing video editing methods are largely confined to predefined tasks and lack the capability to automatically generate high-quality, cinematic short videos guided by user instructions, as well as a unified evaluation benchmark. To address this gap, this work introduces CineBench—the first instruction-driven benchmark for cinematic video editing—and CineAgents, a multi-agent system that adopts a “design-synthesize” paradigm. By performing script reverse engineering to construct hierarchical narrative memory and integrating iterative narrative planning, the proposed approach generates coherent editing scripts. It effectively mitigates context collapse and temporal fragmentation, significantly outperforming existing methods in narrative coherence and logical consistency, thereby demonstrating its efficacy in generating high-quality cinematic short videos from user instructions.
📝 Abstract
The surging demand for adapting long-form cinematic content into short videos has motivated the need for versatile automatic video compilation systems. However, existing compilation methods are limited to predefined tasks, and the community lacks a comprehensive benchmark to evaluate the cinematic compilation. To address this, we introduce CineBench, the first benchmark for instruction-driven cinematic video compilation, featuring diverse user instructions and high-quality ground-truth compilations annotated by professional editors. To overcome contextual collapse and temporal fragmentation, we present CineAgents, a multi-agent system that reformulates cinematic video compilation into ``design-and-compose'' paradigm. CineAgents performs script reverse-engineering to construct a hierarchical narrative memory to provide multi-level context and employs an iterative narrative planning process that refines a creative blueprint into a final compiled script. Extensive experiments demonstrate that CineAgents significantly outperforms existing methods, generating compilations with superior narrative coherence and logical coherence.