MLV-Edit: Towards Consistent and Highly Efficient Editing for Minute-Level Videos

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of high computational cost and difficulty in maintaining global temporal consistency across thousands of frames in minute-scale long video editing. The authors propose a training-free, divide-and-conquer optical flow framework that segments the video for localized editing. To mitigate boundary flickering between adjacent segments, a Velocity Blend module fuses motion information from neighboring clips. Furthermore, an Attention Sink mechanism anchors global reference features to effectively suppress structural drift. Experimental results demonstrate that the proposed method significantly outperforms existing approaches in both temporal stability and semantic fidelity, enabling efficient and high-quality editing of long videos.

Technology Category

Application Category

📝 Abstract
We propose MLV-Edit, a training-free, flow-based framework that address the unique challenges of minute-level video editing. While existing techniques excel in short-form video manipulation, scaling them to long-duration videos remains challenging due to prohibitive computational overhead and the difficulty of maintaining global temporal consistency across thousands of frames. To address this, MLV-Edit employs a divide-and-conquer strategy for segment-wise editing, facilitated by two core modules: Velocity Blend rectifies motion inconsistencies at segment boundaries by aligning the flow fields of adjacent chunks, eliminating flickering and boundary artifacts commonly observed in fragmented video processing; and Attention Sink anchors local segment features to global reference frames, effectively suppressing cumulative structural drift. Extensive quantitative and qualitative experiments demonstrate that MLV-Edit consistently outperforms state-of-the-art methods in terms of temporal stability and semantic fidelity.
Problem

Research questions and friction points this paper is trying to address.

minute-level video editing
temporal consistency
computational overhead
long-duration video
Innovation

Methods, ideas, or system contributions that make the work stand out.

flow-based video editing
temporal consistency
segment-wise editing
motion alignment
structural drift suppression
🔎 Similar Papers
No similar papers found.