EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing methods struggle to achieve high-quality removal of dynamic objects in videos along with their associated visual effects—such as deformations, shadows, and reflections—and are further hindered by the absence of paired datasets for training and evaluation. To address these challenges, this work introduces VOR, a large-scale video object removal dataset, and proposes EffectErase, a novel approach that reformulates object removal by incorporating video object insertion as its inverse task. Within a diffusion-based video inpainting framework, EffectErase employs a mutually inverse learning paradigm that jointly optimizes removal and insertion processes, enhanced by task-aware region guidance and an insertion–removal consistency constraint. This design significantly improves the fidelity of restored effect-laden regions. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art techniques in both synthetic and real-world scenarios.

Technology Category

Application Category

📝 Abstract

Video object removal aims to eliminate dynamic target objects and their visual effects, such as deformation, shadows, and reflections, while restoring seamless backgrounds. Recent diffusion-based video inpainting and object removal methods can remove the objects but often struggle to erase these effects and to synthesize coherent backgrounds. Beyond method limitations, progress is further hampered by the lack of a comprehensive dataset that systematically captures common object effects across varied environments for training and evaluation. To address this, we introduce VOR (Video Object Removal), a large-scale dataset that provides diverse paired videos, each consisting of one video where the target object is present with its effects and a counterpart where the object and effects are absent, with corresponding object masks. VOR contains 60K high-quality video pairs from captured and synthetic sources, covers five effects types, and spans a wide range of object categories as well as complex, dynamic multi-object scenes. Building on VOR, we propose EffectErase, an effect-aware video object removal method that treats video object insertion as the inverse auxiliary task within a reciprocal learning scheme. The model includes task-aware region guidance that focuses learning on affected areas and enables flexible task switching. Then, an insertion-removal consistency objective that encourages complementary behaviors and shared localization of effect regions and structural cues. Trained on VOR, EffectErase achieves superior performance in extensive experiments, delivering high-quality video object effect erasing across diverse scenarios.

Problem

Research questions and friction points this paper is trying to address.

video object removal

visual effects erasing

video inpainting

dataset limitation

background restoration

Innovation

Methods, ideas, or system contributions that make the work stand out.

video object removal

effect erasing

reciprocal learning