NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing video editing methods, which either rely on scarce paired data for high-quality local edits or struggle to simultaneously preserve background content and temporal consistency in unpaired settings. To overcome these challenges, the authors propose NOVA, a novel framework that introduces a synergistic mechanism between sparse control and dense synthesis. Users provide only sparse keyframes as semantic guidance, while a dense branch leverages motion and texture information from the original video to enable high-fidelity unpaired editing. Through a degradation-reconstruction training strategy, NOVA effectively learns motion reconstruction and temporal coherence without paired data. Experiments demonstrate that NOVA significantly outperforms state-of-the-art methods in terms of editing fidelity, motion preservation, and temporal consistency.

Technology Category

Application Category

📝 Abstract
Recent video editing models have achieved impressive results, but most still require large-scale paired datasets. Collecting such naturally aligned pairs at scale remains highly challenging and constitutes a critical bottleneck, especially for local video editing data. Existing workarounds transfer image editing to video through global motion control for pair-free video editing, but such designs struggle with background and temporal consistency. In this paper, we propose NOVA: Sparse Control \& Dense Synthesis, a new framework for unpaired video editing. Specifically, the sparse branch provides semantic guidance through user-edited keyframes distributed across the video, and the dense branch continuously incorporates motion and texture information from the original video to maintain high fidelity and coherence. Moreover, we introduce a degradation-simulation training strategy that enables the model to learn motion reconstruction and temporal consistency by training on artificially degraded videos, thus eliminating the need for paired data. Our extensive experiments demonstrate that NOVA outperforms existing approaches in edit fidelity, motion preservation, and temporal coherence.
Problem

Research questions and friction points this paper is trying to address.

video editing
unpaired data
temporal consistency
background coherence
paired datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Control
Dense Synthesis
Pair-Free Video Editing
Degradation-Simulation Training
Temporal Coherence