Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

In long-form narrative video editing, creators face high cognitive load when locating plot points, tracking character motivations, and reassembling distributed events across multi-hour footage; existing transcription- or embedding-based methods lack narrative understanding and fail to support creative decision-making. Method: We propose the first prompt-driven, modular video editing architecture integrating semantic indexing, temporal segmentation, guided memory compression, and cross-granularity narrative fusion to enable interpretable modeling of plot, dialogue, emotion, and context. The system replaces traditional timeline operations with natural language prompts, balancing automation efficiency with editorial control. Contribution/Results: Evaluated on 400+ videos, our method significantly improves editing efficiency while strictly preserving narrative coherence. Professional editors rated it highly in usability and creative efficacy, and user studies demonstrated strong preference over baseline approaches.

Technology Category

Application Category

📝 Abstract

Creators struggle to edit long-form, narrative-rich videos not because of UI complexity, but due to the cognitive demands of searching, storyboarding, and sequencing hours of footage. Existing transcript- or embedding-based methods fall short for creative workflows, as models struggle to track characters, infer motivations, and connect dispersed events. We present a prompt-driven, modular editing system that helps creators restructure multi-hour content through free-form prompts rather than timelines. At its core is a semantic indexing pipeline that builds a global narrative via temporal segmentation, guided memory compression, and cross-granularity fusion, producing interpretable traces of plot, dialogue, emotion, and context. Users receive cinematic edits while optionally refining transparent intermediate outputs. Evaluated on 400+ videos with expert ratings, QA, and preference studies, our system scales prompt-driven editing, preserves narrative coherence, and balances automation with creator control.

Problem

Research questions and friction points this paper is trying to address.

Editing long narrative videos requires overcoming cognitive demands of storyboarding and sequencing

Existing methods fail to track characters and connect dispersed events in creative workflows

Creators need systems that preserve narrative coherence while allowing prompt-driven editing control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular editing system using free-form prompts

Semantic indexing pipeline with narrative compression

Transparent intermediate outputs for creator refinement

🔎 Similar Papers

No similar papers found.