CrimEdit: Controllable Editing for Counterfactual Object Removal, Insertion, and Movement

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenge in image editing where object removal, insertion, and relocation are typically modeled separately, hindering coherent handling of physically grounded effects such as shadows and reflections. We propose CrimEdit, a unified diffusion-based framework that jointly trains all three tasks within a single model. Our approach introduces task-specific embeddings and region-aware prompt expansion, integrated with classifier-free guidance to enable fine-grained, controllable editing of both target objects and their derived physical effects. Notably, object relocation is achieved in a single denoising step. Contributions include: (1) the first end-to-end framework unifying removal, insertion, and relocation; (2) an effect-aware controllable generation mechanism; and (3) state-of-the-art performance across all three tasks—without additional training or multi-stage pipelines—while achieving superior editing efficiency and visual fidelity.

Technology Category

Application Category

📝 Abstract

Recent works on object removal and insertion have enhanced their performance by handling object effects such as shadows and reflections, using diffusion models trained on counterfactual datasets. However, the performance impact of applying classifier-free guidance to handle object effects across removal and insertion tasks within a unified model remains largely unexplored. To address this gap and improve efficiency in composite editing, we propose CrimEdit, which jointly trains the task embeddings for removal and insertion within a single model and leverages them in a classifier-free guidance scheme -- enhancing the removal of both objects and their effects, and enabling controllable synthesis of object effects during insertion. CrimEdit also extends these two task prompts to be applied to spatially distinct regions, enabling object movement (repositioning) within a single denoising step. By employing both guidance techniques, extensive experiments show that CrimEdit achieves superior object removal, controllable effect insertion, and efficient object movement without requiring additional training or separate removal and insertion stages.

Problem

Research questions and friction points this paper is trying to address.

Unified model for object removal and insertion tasks

Controllable synthesis of object effects during editing

Efficient object movement in single denoising step

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified model jointly trains removal and insertion embeddings

Classifier-free guidance controls object effect synthesis

Single-step object movement using spatial task prompts

🔎 Similar Papers

Exploring Saliency Bias in Manipulation Detection