🤖 AI Summary
In AI-assisted video editing, manual comparison of diverse AI-generated variants is inefficient and yields inconsistent quality. To address this, we propose a “substitutional collaboration” paradigm spanning the entire editing workflow, enabling batch generation of AI suggestions for coarse editing, B-roll insertion, and text-based effects. Our approach supports precise multi-variant comparison via timeline alignment, transcript-based contrast, and differential highlighting visualization. It integrates fused diffusion-based video generation, multimodal (visual, temporal, textual) alignment, and an interactive regeneration interface to facilitate real-time filtering and iterative refinement. A user study with 12 participants demonstrates statistically significant improvements in variant selection speed (+42%) and final edit satisfaction (+38%), validating the framework’s efficacy. This work establishes a scalable, human-AI collaborative architecture for professional video creation.
📝 Abstract
To make an engaging video, people sequence interesting moments and add visuals such as B-rolls or text. While video editing requires time and effort, AI has recently shown strong potential to make editing easier through suggestions and automation. A key strength of generative models is their ability to quickly generate multiple variations, but when provided with many alternatives, creators struggle to compare them to find the best fit. We propose VideoDiff, an AI video editing tool designed for editing with alternatives. With VideoDiff, creators can generate and review multiple AI recommendations for each editing process: creating a rough cut, inserting B-rolls, and adding text effects. VideoDiff simplifies comparisons by aligning videos and highlighting differences through timelines, transcripts, and video previews. Creators have the flexibility to regenerate and refine AI suggestions as they compare alternatives. Our study participants (N=12) could easily compare and customize alternatives, creating more satisfying results.