PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation

📅 2024-12-18
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the slow inference speed, low naturalness, and poor consistency in diffusion-based image object editing, this paper proposes a novel inversion-free and training-free direct pixel-space editing paradigm. Methodologically, it introduces an anchor pixel manipulation mechanism—integrating pixel-level object copying, target-location fusion, and in-place inpainting—alongside multi-level consistency constraints (texture, attribute, and background) and energy-guided latent-space harmonization optimization. Achieving high-fidelity edits in only 16 sampling steps—significantly fewer than the typical 50 steps required by mainstream methods—it attains state-of-the-art performance in both editing quality and efficiency across multiple benchmarks. The core contribution is the first demonstration of consistent object editing without fine-tuning or inversion, enabling joint modification of object position, scale, and composition while preserving fidelity and computational efficiency.

Technology Category

Application Category

📝 Abstract
Recent research explores the potential of Diffusion Models (DMs) for consistent object editing, which aims to modify object position, size, and composition, etc., while preserving the consistency of objects and background without changing their texture and attributes. Current inference-time methods often rely on DDIM inversion, which inherently compromises efficiency and the achievable consistency of edited images. Recent methods also utilize energy guidance which iteratively updates the predicted noise and can drive the latents away from the original image, resulting in distortions. In this paper, we propose PixelMan, an inversion-free and training-free method for achieving consistent object editing via Pixel Manipulation and generation, where we directly create a duplicate copy of the source object at target location in the pixel space, and introduce an efficient sampling approach to iteratively harmonize the manipulated object into the target location and inpaint its original location, while ensuring image consistency by anchoring the edited image to be generated to the pixel-manipulated image as well as by introducing various consistency-preserving optimization techniques during inference. Experimental evaluations based on benchmark datasets as well as extensive visual comparisons show that in as few as 16 inference steps, PixelMan outperforms a range of state-of-the-art training-based and training-free methods (usually requiring 50 steps) on multiple consistent object editing tasks.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Models
Image Editing
Efficiency and Realism
Innovation

Methods, ideas, or system contributions that make the work stand out.

PixelMan
Diffusion Model
Image Editing
🔎 Similar Papers
No similar papers found.