O-DisCo-Edit: Object Distortion Control for Unified Realistic Video Editing

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing video editing diffusion models require task-specific control signals, leading to complex modeling, high training costs, and difficulty in simultaneously achieving unified architecture design and fine-grained object-attribute editing. To address this, we propose O-DisCo-Edit: the first method introducing Object-based Distortion Control (O-DisCo), a stochastic adaptive noise-driven control signal that uniformly encodes diverse editing instructions via a single, generic representation. Coupled with a “copy-form” preservation module, it explicitly safeguards non-edited regions during diffusion by copying latent features from the source frame. Our approach eliminates the need for task-specific architectures or fine-tuning, drastically simplifying multi-task modeling. Extensive experiments across multiple video editing tasks—including object replacement, attribute modification, and motion retargeting—demonstrate consistent superiority over both dedicated and multi-task state-of-the-art methods. Quantitative metrics and human evaluations jointly confirm its superior fidelity, precise controllability, and strong generalization capability.

Technology Category

Application Category

📝 Abstract

Diffusion models have recently advanced video editing, yet controllable editing remains challenging due to the need for precise manipulation of diverse object properties. Current methods require different control signal for diverse editing tasks, which complicates model design and demands significant training resources. To address this, we propose O-DisCo-Edit, a unified framework that incorporates a novel object distortion control (O-DisCo). This signal, based on random and adaptive noise, flexibly encapsulates a wide range of editing cues within a single representation. Paired with a "copy-form" preservation module for preserving non-edited regions, O-DisCo-Edit enables efficient, high-fidelity editing through an effective training paradigm. Extensive experiments and comprehensive human evaluations consistently demonstrate that O-DisCo-Edit surpasses both specialized and multitask state-of-the-art methods across various video editing tasks. https://cyqii.github.io/O-DisCo-Edit.github.io/

Problem

Research questions and friction points this paper is trying to address.

Unified object distortion control for diverse video editing tasks

Single representation for multiple editing cues to reduce complexity

Preserving non-edited regions while enabling high-fidelity video manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified object distortion control signal

Copy-form preservation for non-edited regions

Effective training paradigm for video editing

🔎 Similar Papers

No similar papers found.