🤖 AI Summary
Existing 2D image editing methods rely on pixel-level manipulations, struggling to simultaneously ensure editing consistency and preserve object identity. To address this, we propose a “2D–3D–2D” framework: first reconstructing the target instance from a single input image into an editable 3D representation; then performing semantically consistent edits in 3D space under rigid transformation constraints; and finally generating high-fidelity 2D outputs via geometry-aware reprojection and diffusion-guided inpainting. This is the first method enabling controllable, single-image-driven image editing based on 3D instance modeling—eliminating geometric distortions and identity drift commonly caused by the absence of 3D priors in conventional approaches. Experiments demonstrate substantial improvements over state-of-the-art methods—including DragGAN and DragDiffusion—in editing consistency, identity preservation, and geometric plausibility.
📝 Abstract
Generative models have achieved significant progress in advancing 2D image editing, demonstrating exceptional precision and realism. However, they often struggle with consistency and object identity preservation due to their inherent pixel-manipulation nature. To address this limitation, we introduce a novel "2D-3D-2D" framework. Our approach begins by lifting 2D objects into 3D representation, enabling edits within a physically plausible, rigidity-constrained 3D environment. The edited 3D objects are then reprojected and seamlessly inpainted back into the original 2D image. In contrast to existing 2D editing methods, such as DragGAN and DragDiffusion, our method directly manipulates objects in a 3D environment. Extensive experiments highlight that our framework surpasses previous methods in general performance, delivering highly consistent edits while robustly preserving object identity.