🤖 AI Summary
This work addresses the longstanding challenge in image editing of simultaneously achieving precise object-level control and 3D consistency. Existing 2D methods lack geometric awareness, while 3D approaches often rely on time-consuming optimization or incomplete reconstructions. To overcome these limitations, we propose an efficient interactive editing framework that uniquely integrates 3D Gaussian Splatting (3DGS) with control-point-driven deformation. Given a single input image, our method first constructs an editable 3DGS representation of the target object using an image-to-3D generator. Users then manipulate control points, triggering a graph-based non-rigid deformation governed by as-rigid-as-possible (ARAP) constraints to produce physically plausible geometry edits. A composite diffusion module jointly harmonizes lighting, color, and boundaries. Without explicit 3D reconstruction or iterative optimization, our approach significantly outperforms state-of-the-art 2D and 3D baselines in KID, LPIPS, SIFID metrics and user studies, enabling high-fidelity, fine-grained, and 3D-consistent object-level editing.
📝 Abstract
Achieving precise, object-level control in image editing remains challenging: 2D methods lack 3D awareness and often yield ambiguous or implausible results, while existing 3D-aware approaches rely on heavy optimization or incomplete monocular reconstructions. We present ObjectMorpher, a unified, interactive framework that converts ambiguous 2D edits into geometry-grounded operations. ObjectMorpher lifts target instances with an image-to-3D generator into editable 3D Gaussian Splatting (3DGS), enabling fast, identity-preserving manipulation. Users drag control points; a graph-based non-rigid deformation with as-rigid-as-possible (ARAP) constraints ensures physically sensible shape and pose changes. A composite diffusion module harmonizes lighting, color, and boundaries for seamless reintegration. Across diverse categories, ObjectMorpher delivers fine-grained, photorealistic edits with superior controllability and efficiency, outperforming 2D drag and 3D-aware baselines on KID, LPIPS, SIFID, and user preference.