V2Edit: Versatile Video Diffusion Editor for Videos and 3D Scenes

πŸ“… 2025-03-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the fundamental challenge in video and 3D scene editing: reconciling *temporal and geometric consistency* with *large-scale geometric modifications*. To this end, we propose a training-free, instruction-driven editing framework. Methodologically, it employs progressive subtask decomposition and a render-edit-reconstruct pipeline, introducing a novel triple协同 control mechanism: (i) initial-noise anchoring to preserve structural priors, (ii) stepwise noise modulation for controlled geometric evolution, and (iii) cross-attention guidance between text and video features to enforce semantic alignment. This is the first approach enabling 3D spatiotemporal-consistent editing under *geometrically significant transformations*. Evaluated on multiple video editing and complex 3D scene benchmarks, our method achieves state-of-the-art performance, delivering high-fidelity, geometrically plausible, and spatiotemporally coherent edits without requiring fine-tuning or domain-specific training.

Technology Category

Application Category

πŸ“ Abstract
This paper introduces V$^2$Edit, a novel training-free framework for instruction-guided video and 3D scene editing. Addressing the critical challenge of balancing original content preservation with editing task fulfillment, our approach employs a progressive strategy that decomposes complex editing tasks into a sequence of simpler subtasks. Each subtask is controlled through three key synergistic mechanisms: the initial noise, noise added at each denoising step, and cross-attention maps between text prompts and video content. This ensures robust preservation of original video elements while effectively applying the desired edits. Beyond its native video editing capability, we extend V$^2$Edit to 3D scene editing via a"render-edit-reconstruct"process, enabling high-quality, 3D-consistent edits even for tasks involving substantial geometric changes such as object insertion. Extensive experiments demonstrate that our V$^2$Edit achieves high-quality and successful edits across various challenging video editing tasks and complex 3D scene editing tasks, thereby establishing state-of-the-art performance in both domains.
Problem

Research questions and friction points this paper is trying to address.

Balancing original content preservation with editing tasks
Decomposing complex video and 3D editing into simpler subtasks
Achieving high-quality, 3D-consistent edits in complex scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework for video and 3D editing
Progressive strategy with synergistic control mechanisms
Render-edit-reconstruct process for 3D scene editing
πŸ”Ž Similar Papers
No similar papers found.