🤖 AI Summary
Current video re-editing workflows are complex, time-consuming, and heavily reliant on specialized expertise, making intuitive and flexible modifications—akin to text editing—difficult to achieve. This work proposes a text-driven approach to video re-creation that, for the first time, enables inverse generation of editable textual prompts from input videos. The authors introduce Rewrite Kit, an interactive tool allowing users to directly revise these prompts to rewrite video content. By integrating generative reconstruction algorithms with human-centered interaction design, the method not only uncovers discrepancies between human and model interpretations of video semantics but also unlocks novel applications such as virtual reshoots and synthetic coherence. Feasibility is demonstrated through technical evaluation and a probe study involving twelve creators, highlighting both the creative potential and inherent challenges of the approach.
📝 Abstract
Video is a powerful medium for communication and storytelling, yet reauthoring existing footage remains challenging. Even simple edits often demand expertise, time, and careful planning, constraining how creators envision and shape their narratives. Recent advances in generative AI suggest a new paradigm: what if editing a video were as straightforward as rewriting text? To investigate this, we present a tech probe and a study on text-driven video reauthoring. Our approach involves two technical contributions: (1) a generative reconstruction algorithm that reverse-engineers video into an editable text prompt, and (2) an interactive probe, Rewrite Kit, that allows creators to manipulate these prompts. A technical evaluation of the algorithm reveals a critical human-AI perceptual gap. A probe study with 12 creators surfaced novel use cases such as virtual reshooting, synthetic continuity, and aesthetic restyling. It also highlighted key tensions around coherence, control, and creative alignment in this new paradigm. Our work contributes empirical insights into the opportunities and challenges of text-driven video reauthoring, offering design implications for future co-creative video tools.