Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In long-form narrative video editing, creators face high cognitive load when locating plot points, tracking character motivations, and reassembling distributed events across multi-hour footage; existing transcription- or embedding-based methods lack narrative understanding and fail to support creative decision-making. Method: We propose the first prompt-driven, modular video editing architecture integrating semantic indexing, temporal segmentation, guided memory compression, and cross-granularity narrative fusion to enable interpretable modeling of plot, dialogue, emotion, and context. The system replaces traditional timeline operations with natural language prompts, balancing automation efficiency with editorial control. Contribution/Results: Evaluated on 400+ videos, our method significantly improves editing efficiency while strictly preserving narrative coherence. Professional editors rated it highly in usability and creative efficacy, and user studies demonstrated strong preference over baseline approaches.

Technology Category

Application Category

📝 Abstract
Creators struggle to edit long-form, narrative-rich videos not because of UI complexity, but due to the cognitive demands of searching, storyboarding, and sequencing hours of footage. Existing transcript- or embedding-based methods fall short for creative workflows, as models struggle to track characters, infer motivations, and connect dispersed events. We present a prompt-driven, modular editing system that helps creators restructure multi-hour content through free-form prompts rather than timelines. At its core is a semantic indexing pipeline that builds a global narrative via temporal segmentation, guided memory compression, and cross-granularity fusion, producing interpretable traces of plot, dialogue, emotion, and context. Users receive cinematic edits while optionally refining transparent intermediate outputs. Evaluated on 400+ videos with expert ratings, QA, and preference studies, our system scales prompt-driven editing, preserves narrative coherence, and balances automation with creator control.
Problem

Research questions and friction points this paper is trying to address.

Editing long narrative videos requires overcoming cognitive demands of storyboarding and sequencing
Existing methods fail to track characters and connect dispersed events in creative workflows
Creators need systems that preserve narrative coherence while allowing prompt-driven editing control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular editing system using free-form prompts
Semantic indexing pipeline with narrative compression
Transparent intermediate outputs for creator refinement
🔎 Similar Papers
No similar papers found.
Z
Zihan Ding
University of British Columbia, Canada
J
Junlong Chen
Department of Engineering, University of Cambridge, United Kingdom
Per Ola Kristensson
Per Ola Kristensson
Professor of Interactive Systems Engineering, Department of Engineering, University of Cambridge
Human-Computer InteractionIntelligent Interactive SystemsSpeech and Language ProcessingVirtual and Augmented Reality
J
Junxiao Shen
School of Computer Science, University of Bristol, United Kingdom and Memories.AI, United States
X
Xinyi Wang
University of Bristol, United Kingdom