MLV-Edit: Towards Consistent and Highly Efficient Editing for Minute-Level Videos

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This work addresses the challenges of high computational cost and difficulty in maintaining global temporal consistency across thousands of frames in minute-scale long video editing. The authors propose a training-free, divide-and-conquer optical flow framework that segments the video for localized editing. To mitigate boundary flickering between adjacent segments, a Velocity Blend module fuses motion information from neighboring clips. Furthermore, an Attention Sink mechanism anchors global reference features to effectively suppress structural drift. Experimental results demonstrate that the proposed method significantly outperforms existing approaches in both temporal stability and semantic fidelity, enabling efficient and high-quality editing of long videos.

Technology Category

Application Category

📝 Abstract

We propose MLV-Edit, a training-free, flow-based framework that address the unique challenges of minute-level video editing. While existing techniques excel in short-form video manipulation, scaling them to long-duration videos remains challenging due to prohibitive computational overhead and the difficulty of maintaining global temporal consistency across thousands of frames. To address this, MLV-Edit employs a divide-and-conquer strategy for segment-wise editing, facilitated by two core modules: Velocity Blend rectifies motion inconsistencies at segment boundaries by aligning the flow fields of adjacent chunks, eliminating flickering and boundary artifacts commonly observed in fragmented video processing; and Attention Sink anchors local segment features to global reference frames, effectively suppressing cumulative structural drift. Extensive quantitative and qualitative experiments demonstrate that MLV-Edit consistently outperforms state-of-the-art methods in terms of temporal stability and semantic fidelity.

Problem

Research questions and friction points this paper is trying to address.

minute-level video editing

temporal consistency

computational overhead

long-duration video

Innovation

Methods, ideas, or system contributions that make the work stand out.

flow-based video editing

temporal consistency

segment-wise editing

motion alignment