EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes an efficient video inpainting and editing framework that overcomes the high computational cost of existing generative approaches, which typically process the entire video context even for localized edits. The method introduces a locality-first generation mechanism—achieved through a local video context module that operates exclusively on masked regions—whose computational complexity scales proportionally with the size of the edited area. Temporal consistency is preserved via a lightweight temporal global context embedder, while decoupling local and global control enables multi-region text-guided editing and autoregressive content propagation. Experiments demonstrate that the approach surpasses full-attention baselines in editing quality while achieving a tenfold improvement in computational efficiency, enabling real-time multi-region video editing.

Technology Category

Application Category

📝 Abstract
High-fidelity generative video editing has seen significant quality improvements by leveraging pre-trained video foundation models. However, their computational cost is a major bottleneck, as they are often designed to inefficiently process the full video context regardless of the inpainting mask's size, even for sparse, localized edits. In this paper, we introduce EditCtrl, an efficient video inpainting control framework that focuses computation only where it is needed. Our approach features a novel local video context module that operates solely on masked tokens, yielding a computational cost proportional to the edit size. This local-first generation is then guided by a lightweight temporal global context embedder that ensures video-wide context consistency with minimal overhead. Not only is EditCtrl 10 times more compute efficient than state-of-the-art generative editing methods, it even improves editing quality compared to methods designed with full-attention. Finally, we showcase how EditCtrl unlocks new capabilities, including multi-region editing with text prompts and autoregressive content propagation.
Problem

Research questions and friction points this paper is trying to address.

generative video editing
computational efficiency
video inpainting
local editing
global context
Innovation

Methods, ideas, or system contributions that make the work stand out.

disentangled control
local video context
efficient video inpainting
temporal global consistency
generative video editing
🔎 Similar Papers
No similar papers found.
Y
Yehonathan Litman
Meta Reality Labs
Shikun Liu
Shikun Liu
Research Scientist, Meta AI
Machine LearningComputer Vision
D
Dario Seyb
Meta Reality Labs
N
Nicholas Milef
Meta Reality Labs
Y
Yang Zhou
Meta Reality Labs
Carl Marshall
Carl Marshall
Meta
Shubham Tulsiani
Shubham Tulsiani
Carnegie Mellon Univesity
Computer Vision
C
Caleb Leak
Meta Reality Labs