🤖 AI Summary
This work addresses the challenge of achieving consistent cross-view editing in 3D representations—a limitation of conventional 2D image editing methods. We propose a feed-forward 3D image editing framework that requires neither text prompts nor hand-crafted masks. Leveraging a pre-trained multi-view diffusion model, our method constructs an edit-aware optical flow field in its latent space, conditioned on a pair comprising the original multi-view image and a single user-edited view; this flow field automatically propagates edits to unseen views. Our key contributions are: (1) the first fully mask-free, optimization-free, and text-free 3D editing approach; (2) a latent-space flow field conditioned on image pairs, ensuring geometric coherence and appearance identity preservation across views; and (3) comprehensive evaluation demonstrating high-fidelity editing and strong multi-view consistency across diverse object categories and complex editing tasks.
📝 Abstract
We present EditP23, a method for mask-free 3D editing that propagates 2D image edits to multi-view representations in a 3D-consistent manner. In contrast to traditional approaches that rely on text-based prompting or explicit spatial masks, EditP23 enables intuitive edits by conditioning on a pair of images: an original view and its user-edited counterpart. These image prompts are used to guide an edit-aware flow in the latent space of a pre-trained multi-view diffusion model, allowing the edit to be coherently propagated across views. Our method operates in a feed-forward manner, without optimization, and preserves the identity of the original object, in both structure and appearance. We demonstrate its effectiveness across a range of object categories and editing scenarios, achieving high fidelity to the source while requiring no manual masks.